DOCUMENT RESUME 



ED 301 838 



AUTHOR 



CS 211 614 



Greenburg, Karen, Ed.; Slaughter, Ginny, Ed. 
^^"^^2 Notes from the National Testing Network m Writing. 

volume VIII, November 1988. 
INSTITUTION City Univ. of New York, N.Y. Office of Academic 
Affairs. 

SPONS AGENCY Fund for the Improvement of Post secondary Education 

(ED) , Washington, DC. 
PUB DATE Nov 88 

NOTE 34p.7 Abstracts of papers presented at the National 

Testing Network m Writing Conference (1988). 

PUB TYPE Collected Works - Conference Proceedings (021) — 

Collected Works - Serials (022) 

EDRS PRICE MFC1/PC02 Plus Postage. 

DESCRIPTORS Abstracts? Computer Uses m Education; Essay Tests; 

Holistic Evaluation; Scaling; ^Scoring; *Writing 
Evaluation; *Writing Research 

ABSTRACT 

This newsletter contains 32 abstracts of 
approximately looo words each of papers presented at the 1988 
conference the National Testing Network in Wrjting. Abstracts, 
listed with their authors, include "Instructional Directions from 
Large Scale K-12 Writing Assessments" (c. Chew) ; "Portfolio 
Assessment across the Curriculum: Early Conflicts" (c. Anson and 
others); "Revamping the Competency Process for Writing: A Case Study" 
(D. Holdstein and I. Bosworth); "We Did It and Lived: A State 
University Goes to Exit Testing" (P. Listen and others); "Proficiency 
Tesi-ing: issues and Models" (G. Gadda and M. Fowles) ; "Creating, 
Developing, and Evaluating a College-Wide Writing Assessment Program" 
(S. Groden); "How to Organize a Cross-Curricula Writing Assessment 
Program" (G. Hughes-Wiener and others); "Presenting a Unified Front 
in a University Writing and Testing Program" (L. Silverthorne and P. 
Stephens); "Evaluating a Literacy across the Curriculum Program: 
Designing an Appropriate Instrument" (L. Shohet); "Validity Issues in 
Direct Writing Assessment" (K. Greenberg and S. Witte); "Reliability 
Revisited: How Meaningful Are Essay Scores?" (E. White); 
"Establishing and Maintaining Score Scale Stability and Reading 
Reliability" (w. Patience and J. Auchter) ; "Training of Essay 
Readers: A Process for Faculty and Curriculum Development" (R. 
Christopher); "Discrepancies in Holistic Evaluation" (D. Daiker and 
N. GiTogan); "Problems and Solutions in Using Open-Ended Primary 
Traits" (M. C. Flanagan); "The Implications of Using the Rhetorical 
Demands of College Writing fo: Placement" (K. Fitzgerald); "Using 
Video in Training New Readers of Assessment Essays" (G. Cooper); "WPA 
Presentation on Evaluating Writing Programs" (R. Christopher and 
others); "Developing and Evaluating a Writing Assessment Program" (L. 
Boehm and M. A. McKeever) ; "The Changing Task: Tracking Growth over 
Time" (c. Lucas); "Assessing Writing to Teach Writing" (v. Spandel) ; 
"Reader-Response Criticism as a Model for Holistic Evaluation" (K. 
Schnapp); "The Discourse of Self-Assessment: Analyzing Metaphorical 
Stories" (B. Tomlinson and P. Mortensen) ; "The Uses of Computers m 
the Analysis and Asi^essment of Writing" (w. Wresch and H. Schwartz); 
"Legal Ramifications of Writing Assessments" (W. Lutz); "Some Not So 
Random Thoughts on the Assessment of Writing" (A. Purves); "Research 
on National and International Writing Assessments" (A. Purves and 
others); "Teaching Strategies and Rating Criteria: An International 
Perspective" (s. Takala and R. E. Degenhart); "Effects of Essay Topic 
Variations on student Writing" (G. Brossell and j. Hoetker); "What 
Should Be a Topic?" (s. Murphy and L. Ruth); "Classroom Research and 
j-rnn Writing Assessment" (M. Meyers); and "Computers and the Teaching of 

Writing" (M. Ribaudo and L. Meeker). (RS) 



VOLUME VIII 



-PERMISSION TO REPRODUCE THIS 
MATERIAL HAS BEEN GRANTED BY 



TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERIC) " 



U S OCMffTMCNT OF tDUCATION 

Off<c« of Educational RaaMrcn and improvament 

EDUCATIONAL RESOURCES INFORMATION 
CENTER (ERIC) 

r This document has been reproduced as 
received from the person or organization 
oriQinttmg it 

r Minor Changes have been made »o improve 
reproduction quality 

• Points ot vfew or opinions Stated m tf ts docu 
ment do not necessarily represent ofttciai 
OERl position or policy 



NOTES FROM THE 

NATIONAL TESTING NETWORK 
IN WRITING 



NOVEMBER 1988 



A Project of The City University of New York and The Fund to, the Improvcmem of Postsecondary Education 



The National Testing Network in Writing, now in 
its seventh year» numbers 3,000 lacmbers across eleven 
countries. We arc busier than ever collecting, cataloging, 
and disseminating information and data on measures and 
procedures used to assess students' writing skills. W'^ are 
grateful for your help in sending us materials from 
testing programs-which, amazingly, are still 
proliferating, as questions about when, how, and v, hcthcr 
to lest writers continue to plague teachers and sch(K)l 
administrators. 

From the beginning, NTNW has sought to help 
members find answers to these questions. Our eight issues 
ofNot^s, the book. Writing Assessment: Issues and 
Strateg ies, and our annual conferences have attracted 
teachers, administrators, and assessment specialists from 
institutions around the world to examine models, and to 
explore the impact of assessment on pedagogy , curricula, 
and students. 

The 1989 conference will be international in scope, 
featuii.ig noted researchers from eight countries who will 
lead workshops and present their latest findings. The 
conference, co-sponsored by Dawson College, will 



lake place Sunday, April 9th through Tuesday, April 11th 
(to allow for a weekend in Quebec) at the Centre Sheraton 
Hotel in Montreal, Canada. A new feature will be prc- 
and post-conference workshops. (See the centerfold of this 
issue for more information and a registration form.) 

This issue of Notes 'continues the tradition of 
publishing abstracts from the annual conferences. The 
1988 conference was co-sponsored by the University of 
Minnesota under the coordination of Chris Anson. Here 
are the abstracts of all of the workshops and panels, 
grouped acceding to themes: the first nine describe 
models of successful writing assessment programs, 
followed by eight that focus on models of scales and 
scoring; nine abstracts examine the impact of wriUng 
assessment on students, faculty, and curricula; and the 
final six examine current research on writing assessment. 

The theme of the upcoming 1989 conference is 
"Writing Assessment Across Cultures." We hope you 
will join us in Montreal in April. 

Karen Grernberg and Ginny Slaughter 



We wish to thank the people who graciously contributed to this issue of Notes, First we thank all of the 
Recorders whose summary rcpvorts make up the bulk of the issue. We also thank Chris /jison of the University 
of Minnesota, Twin Cities for organizing a wonderful conference and Leshe Denny for carefully overseeing all of 
the local arrangements. And v/e are grateful to Haeryung Shin and Roddy Potter for their invaluable editorial 
assistance. Finally, our thanks to i'arvey Wiener, University Associate Dean for Academic Affairs, and Director of 
CUNY's Instructional Resource Ceiiter, for supporting the production of NTNW Notes. 



NOTES is >ub!:shed by the Instructional Resource Center, Office of Academic Affairs 



The City University of New York, 535 East 80th Street, Nen York, New York 10021 



1 



CONTENTS Page 



MODELS OF EFFECTIVE WRITING 
ASSESSMENT PROGRAMS 

Instru^nonal Dirccdons from Large Scale K-12 4 

Writing Assessments: CHARLES CHEW 

Portfolio Assessment Across the Curriculum: Early 6 
Conflicts: CHRISM. ANSON, ROBERT L.BR' ^WN, 
& LILLIAN BRIDWELL-BOWLES 

Revamping the Competency Process for Writing: A 7 
Case Study: DEBORAH HOLDSTEIN & INES 
BOSW(»TH 

We Did It and Lived: A State University Goes to 7 
Exit Testing: PHYLUS USTON, JOHN MATHEW, & 
UNDAPELZER 

Proficiency Testing: Issues and Models: 8 
GEORGE G ADDA & MARY FOWLES 

Creating, Developing, and Evaluating a College- 9 
Wide Writing Assessment Program: SUZY GRODEN 

How to Organize a Cross-Curricula Writing 10 
Assessment Program: GAIL HUGHES-WIENER, 
SUSAN JENSEN-CEKALL\, MARY THORNTON- 
PHILLIPS & GERALD MARTIN 

Presenting a Unified Front in a University Writing 10 
and Testing Program: LANA SILVERTHORNE & 
PATRICIA STEPHENS 

Evaluatin^Literacy Across the Curriculum Program: 1 2 

Designing an Appropriate Instrument: 

UNDASHOHET 

MODELS OF SCALES AND SCORING 
Validity Issues in Direct Writing Assessment: 13 
KAREN GREENBERG & STEPHEN WTTTE. 

Reliability Revisited: How Meani^igful Are Essay 14 
Scores?: EDWARD WHITE. 

Establishing and Maintaining Score Scale Stability 14 

and Reading Reliability: 

WAYNE PATIENCE & JOAN AUCHFER 

Training of Essay Readers: A Process for Faculty and 1 5 
Curriculum Development ROBERT CHRISTOPHER 

Discrepancies in Holistic Evaluaion: 16 
DONALD DADCER & NEDRA GROGAN 

Probems and Solutions in Using Open-Ended 17 
Primary Traits: MICHAEL C FLANAGAN 

The ImpUcatic is of Using the Rhetorical Demands 1 7 
of College Writing for Placement: 
KATHRYN FTITGERALD 



Page 

Using Video in Training New Readers of 20 
Assessment Essays: GEORGE COOPER 

THE IMPACT OF WRITING ASSESSMENT ON 
STUDENTS, FACULTY, AND CURRICULA 

WPA Presentation on Evaluating Writing 11 
Programs: ROBERT CHRISTOPHER, DONALD 
DADCER, & EDWARD WHITE 

Developing and Evaluating a Writing 22 
Assessment Plrogram: LORENZ BOEHM & MARY 
ANNMCKEEVER 

The Changing Task: Tracking Growth Ovw Time: 23 
CATHERINE LUCAS 

Ass ^ssing Writing to Teach Writing: 24 
Va. /JSPANDEL 

Reader-Response Criticism as a Mod'^.l for Holistic 25 
Evaluation: KARLSCHNAPP 

The Discourse of Self-Assessment: Analyzing 25 
Metaphorical Stories: BARBARA TOMIiNSON & 
PETER MORTENSEN 

The Uses of Computers in the Analysis and 26 
Assessment of Writing: WILLL\M VrTlESCH & 
HELEN SCHWARTZ 

Legal RamiOcations of Writing Assessment: 27 

wnuAMLurz 

Some Not So Random Thoughts on the Assessment 28 
of Writing: ALAN C PURVES 

CURRENT RESEARCH ON WRITING 
ASSESSMENT 

Research on National and International Writing 29 

Assessments: ALAN C PURVES, THOMAS 

GORMAN, & RAINER LEHMANN 

Teaching Strategies and Rating Criteria: An 29 
International P^spective: SAULITAKALA& 
R. ELAINE DBGENHART 

Effects of Essay Tc^ic Variations on Student 30 
Writing: GORDON BROSSELL & JIM HOETKER 

Whit Should Be a Topic? 31 
SANDRA MURPHY & LEO RUTH 

Classroom Research and Writing Assessment: 32 
MYLES MEYERS 

Computers and the Teaching of Writing: 33 
MICHAEL RIB AUDO & UNDA MEEKER 



ERLC 



3 , 



INSTRUCTIONAL DIRECTIONS FROM LARGE 
SCALE K-12 WRITING ASSESSMENTS 

Speaker: Charles Chew, New Voik 

State Dq)artnient of Education 

Introducer: Marie Jean Lederman 
Banich College, 
CUNY and NTNW 

It is now generally agreed that (1) direct assessment 
of writing should, if possible, approximate what we 
expect when students write; (2) learning to write is a 
process which takes place over time, sometimes 
recursively; and (3) requiring students to write whole 
discourses is a better assessment tool than the objective 
test of discrete skills. In 1979, when New York State 
first instituted a writing competency test including three 
writing samples, it was virtually alcne in its attempt to 
assess students' writing ability through multiple writing 
samples. Five years have passed since the initii.'ion of 
the first Regents Competency Test in Writing. The 
program has grown to encompass not only the eleventh 
grade but the eighth grade and fifth grade as well. Since 
September of 1985, G.E.D. diploi^.3 candidates must also 
write an essay. 

The Writing Test for New York State Elementary 
Schools administered at grade 5 comes very close to 
approximating the composing process. Students are 
required to write two different pieces for the test on two 
different days. A prewriting section precedes the writing 
sample and is not evaluated Students draft a response and 
then redraft In the tests at the secondary level, the 
Preliminary Competency Test in Writing at grade 8 
requires students to write three pieces, as does the Regents 
Competency Test in Writing which is administered in 
grade 1 1 and is a requirement for graduation. The 
Comprehensive Examination in English, which is 
administered usirally to average or above '.verage students 
requires two writing samples. 

If tests are to approximate the reality of the writing 
process and are to have a positive effect on instruction, 
they need to require many types of writing . The outline 
below shows the types of writing assessed in New York 
State Writing Assessment programs. 

Types of Writing 

Writing Test for New York S^t^tc Flemcntorv Srhnnk 



Personal 
Expression 

Description 



Process 



Story 
Starter 



The writer recounts a 
feeling or an emotion. 

The writer describes a person, 
object, or place. 

The writer explains how to do 
something. 

The vmter completes a story 
which is started in the writing 
prompt. 



Preliminar y and Regents Competency Test in Writing 



Business 
Letter 



Report 



Persuasive 
Discourse 



The writer at eighth grade 
writes a letter ordering 
something. At grade 11, »hc 
writer composes a letter of 
complaint and suggests how to 
remedy the situation. 

The writer takes data supplied 
and prepares a report for 
another person or 'he class. 

The writer attempts to 
persuade the reader to take 
some action by stating the 
action to be taken and giving 
reasons why such action should 
betaken. 



Comprehensive Examination in Eng lish 



Essay 



Composition 



The writer, using hlerature 
which has been read, responds 
to a given question which is 
generic in the sense that a wide 
variety of literature coi*ld be 
used in the response. 

The writer can choose to do 
one question from among 
eight. Two of these are 
situations which provide a 
purpose and audience. The 
other six are discrete topics 
which require a full rhetorical 
invention by the writer. 



Personal The vmter recounts an 

Narrative experience which he/she had. 



The methods of evaluation used in this testing 
program also speak to instruction. Students' writing 
samples are evaluated holistically at grade 5, and modified 
holistic scoring is used with all the other tests. This 



4 

4 



rating procedure delivers a message to anyone in the state 
who is involved with the testing program. The idea that 
the whole piece of writing may be worth more than any 
one single feature is an important message to teach^^. 
who have for years spent an inordinate amount of time 
"red-penciling" errors in students* work. Many of the 
criteria used to evaluate the writing samples are virtually 
the same for grades S-1 U indicating that these elements 
are seen as essential in a competent piece of writing. The 
fact that the criterion focusing on mechanics is not at the 
top of the list reminds teachers that mechanics, although 
important, are not the "be aM and end all" of written 
discourse. The fact that the tests are unlimited in time and 
that length is merely suggested delivers additional 
messages about the teaching of writing. 

Any student who fails below the State Reference 
Point on the writing tests is required by tlie state to 
receive additional or remedial instruction. Parents must be 
notified of the student's grade and must be infonn<^ of the 
remedial program established for the student. These 
programs must begin no later than one semester after the 
administration of the test. Students can be removed from 
remediation if it can be documented that deficiencies have 
been overcome. At the senior high school level, students 
must pass the Regents Competency Test in Writing in 
order to receive a diploma. 

To meet the needs of educators at the local school 
level in rating the tests and to devise instructional 
strategies to meet student needs in writing, the Bureau of 
English and Reading Education developed a two-year in- 
service program. The fu'st phase of the program 
identified fifty key teachers or supervisors, representing 
geographic areas of the state, who came to Albany for a 
two and one-half day intensive training program. This 
program focused on rating procedures, develc^ing a 
workshop agenda, and actually simulating the role of 
workshop leader. When these fifty people returned to their 
local areas, they in turn trained teachers from the local 
schools who were involved directly with students affected 
by the writing tests. The success of the program was 
confumed by the evaluations done by workshop 
participants. The sampling by the Bureau of ratings of 
test papers done locally attested to Ihe reliability of local 
rating. 

Because of the success of our assessment program, 
there is a reluctance to make changes. In New York we 
have sensed a need to change the examinations for a 
number of years. After extensive discussion, pretesting 
and field testing, changes in the examinations will begin 
in the 1988-89 school year. Part III of the Preliminary 
Competency Test will be changed to reflect the revised 
composition curriculum for New York State, and the 
purposes for writing will rotate among those covered in 
this curriculum material. Evaluation of the samples will 
no longer require model answers. Although rating will be 



done in much the same way, raters will rely on criteria 
only. This change goes into effect for both the PCT and 
RCT. In January, 1989, the format of the business letlcr 
on the Regents Competency Test in Writing will change 
and information needed by the test taker will be in note or 
outline form. This change will require the student to 
process the demands of the task and formulate a response 
rather than simply reword the task. In January 1992, Part 
III of the RCT will change to follow the change begun in 
the PCT. 

I conclude by poindng out some problems, 
questions, and concerns which still need to be addressed by 
test makers and others interested in improving students* 
writing ability. These are as follows: 

(1) Samples of students' writing for evaluation and 
instructional purposes must be obtained 
throughout the school year, not only at test time. 

(2) Research needs to focus on the development of 
writers over time. 

(3) Research needs to determine if skills differ 
appreciably for various types of writing in a test 
situation. 

(4) Research needs to ascertain the relationship 
between writing done during a test and that done 
by the student at other times. 

(5) Writing prompts may riot tap the experience of 
the writers. 

(6) Instruction can be limited to test items. Students 
may spend an inordinate amount of time writing 
business letters and structured responses to 
literature. 

(7) Evaluators using holistic scoring may not 
appreciate the fact that more must be done with 
student papers to plan instruction. 

(8) Once common elements of competent writing 
have been identified, instructional strategies need 
to be developed which will enable teachers to 
focus on these elements in a total language 
approach. 

(9) In-service programs connected to a test nay be 
Umited when compared to extensive needs of 
teachers. 



Those of us involved in the assessment of writing know 
how much more we know today than we knew just a few 
years ago, but there is still much to be learned. 



PORTFOLIO ASSESSMENT ACROSS THE 
CURRICULUM: EARLY CONFLICTS 

Speakers: Chris M. Anson, Robert L Brown, Jr., 

arJ Lillian Bridwell'Bowles, University 
of Minnesota 

Introducer Virginia Slaughter, CUNY 

Faced with a mandate to begin assessing students' 
writing at the Uni* ersity of Minnesota, members of the 
Program in Composition and Communication there 
finally convinced an interdisciplinary task f(wce that a 
cross-curricular portfolio assessment *vouId be the only 
way to bring about large-scale changes in the quantity and 
quality of writing instruction beyond their own writing 
program. In this session, the speakers sha/ed pieces of an 
ongoing cultural critique that focuses oii the political, 
curricular and ideological contexts in which they are 
struggling to turn a potentially damaging process into a 
method for empowerment, enrichment, and educational 
change. 

Currently, the University of Minnesota plans to 
require applicants to submit a high school portfolio as 
part of the admission requirements. These portfolios 
require samples of writing from several subject areas as a 
way to encourage writing across the curriculum in the 
high schools. Throughout their college years, students 
will continue to build on their portfolio until they are 
juniors, at which time their major department will be 
responsible for assessing the quality of their writing for 
exit firom junior-year status. Increased attention to 
writing, including new composition courses, writing- 
intensive courses across the curriculum, and trial 
assessments before the junim* year, will provide support 
for the assessment program. Composition faculty will 
take on a greater consultative role to help departments 
incorporate writing into their curriculum and to help them 
establish methods for the portfolio assessment. 

Chris Anson described the University's plans for this 
assessment as these are outlined in the 1987 report of the 
Task Force on Writing Standards. Reactions to the report 
were solicited from departments and colleges at the 
University of Minnesota, from 143 secondary school 
teachers and administrators across the state, and from 
assorted other readers, including personnel at the 
Minnesota State Department of Education and local 
professionals. Anson's cose reading of these readers' 
responses to the report revealed a more positive attitude 
toward instruction among secondary teachers than among 
teachers at the University itself. In comparison to college 
Eaculty, the secondary teachers showed a deeper 
understanding of the relationship between testing and 
teaching, expressed fewer fears about increased workload, 

ERLC 



and worried more about the potential hazards of testing 
when it does not support enhanced instrucdon. Using 
quotations from several responses, Anson showed how 
faculty members' views of writing assessment are not 
only saturated by their tacit endorsement of the 
surrounding academic values of their institution, but also 
oy the more specific ideological perspectives of their 
discipline. 

Anson explained the resistance to portfolio 
assessment among the college faculty by describing the 
instibitional ethos at Minnesota, a university that 
privileges research and publication and de-emphasizes 
undergraduate education. After expiring some of the 
ideological reasons why university faculty resist rich types 
of assessment and accept simplistic types (such as 
multiple-choice tests of grammar skills), Anson argued 
that before a writing assessment program can be 
implemented successfully, administrators must study and 
understand the academic culture that surrounds the planned 
assessment. Armed with this knowledge, administrators 
can plan ways of implementing rich assessment programs 
without facing the sort of resistance that can lead to 
impoverished tests and instructional decay. 

Central to these understandings is an awareness of 
the relationship between writing programs and the larger 
academic culture. Composition teachers and 
administrators in radical writing programs are change 
ag£Qlfi, whose political praxis must be consciously 
grounded in theory or run the risk of becoming ineffectual, 
or worse, of merely reinscribing the ideologies tliey seek 
to change. Beginning with this premise, Robert Brown 
set out to raise theoretical questions central to such praxis. 
An adequate theory, he claimed, would be henneneutic, 
and might take as its text the university itself, in its 
several manifestations: the behaviors of its members, its 
constituting texts, and its orgamzational sUiiciures. The 
uni versity-as-text speaks of knowing and knowledge : 
' eir nature, value (economic and otherwise), creation and 
social utility. We might pn>fitably read this text through 
the reciprocal processes defmed in radical ethnography. If 
we do, we can simultaneously explicate the bureaucratic 
forces we encounter in attempting to build genuine 
literacy programs, and our own culture-specific ideologies. 

Creating change requires ongoing dialogue across the 
curriculum about such issues as standards vs. individuality 
and creativity; program assessment vs. individual growth; 
and the place of writing instruction in the rise of the nev^ 
professionalism vs. the liberal arts education. Arguing 
that change is possible with the right incentives for 
faculty, Lillian Bridwell-Bowles concluded the session by 
outlining some of the assessment activities underway at 
Minnesota. These include a study of ''strong, typical and 
weak" writing samples across the undergraduate 
curriculum, studies of writing in "linked courses'* which 
combine composition insmiction with content learning, 
and planning the implementation of portfolios as a 
requirement for admission. The newly endowed Deluxe 
Center for interdisciplinary Studies of Writing will 



t 



provide o-'oing research funds for faculty intercsied in 
five catt<,aries: the status of writing ability during the 
college years; characteristics of writing across the 
curriculum; the functions of writing in leamiiig; 
characteristics of writing beyond the academy; and 
curricular reform in undergraduate education. Other efforts 
ro improve the context for the planned assessment include 
earlv pilot projects for portfolio assessment that have been 
conducted in 18 Twin Cities Metropolitan school districts, 
and collaborative writing assessment projects that are pan 
of the Alliance for Undergraduate Education, a consortium 
of 13 public research universities. 



REVAMPING THE COMPETENCY PROCESS FOR 
WRITING: A CASE STUDY 



Holdstein noted that the number of questions on the 
test was reduced from 5 to 3 (although the test time is 
still 60 minutes) in response to students* complaints that 
the number of tasks on the old test forced them to spend a 
lot of time reading and figuring out questions instead of 
actually composing. One of the three new questions reads 
as follows: 

"Matrimony is a process by which a grocci* 
acquired an account the florist had." What does 
this quote say about the transition from single to 
maniedlife? Is it accurate? How so -or how 
not? Again, be sure to fonnulaie a thesis witn 
your point of view, and use speciflc example^; to 
back up your points. 



Speakers: Deborah Holdstein, Governors Slate 
University, Illinois 
ines Boswoth, Iiducauonal Testing 
Service, Illinois 

Introducer/ 

Recorder: Lu Ming Mao, University of Minnesota 

Deborah Holdstein began by describing Governors 
Slate University, a junior, senior, and graduate institution 
with diversified s.^dent body. The University used to 
have a writing competency test for prospective juniors and 
seniors. Accompanying this competency test were a set 
of grading standards and a pass-and-fail system, both of 
which had continually drawn criticism from test readers 
and scorers alike, because ihe> were vague and not 
academically sound. According to this set of standards, a 
passing essay must (1) respond to the stated topic; (2) 
have a clearly stated thesis; (3) show clear, logical 
organization of ideas in organized, well-developed 
paragraphs; (4) include supporting details; (5) demonstrate 
one's editing ability. 

Holdstein was asked to change this system. She 
observed that the process of revamping an assessment 
program was as political as it was academic. One 
misperception bandied around a lot was that the English 
teachers were deteimincd to flunk students. Holdstein 
recalled that they needed someone from outside, an expert 
with no slake, political or otherwise, in the system, to 
help leacliers revamp the system. Ines Bosworth from 
Educational Testing Service was brought in as a 
consultant. Bosworth emphasized that as a neutral 
observer, she was able to get different opinions from 
faculty in different departments. These discussions 
became extremely useful because they enabled faculty to 
articulate ih^^ir concerns about possible changes in the 
testing program. Out of these discussions-and ;he 
Provost's unfailing support-came the new scoring criteria, 
which have four major areas: focus, organization, 
olaboratiCii (support) and convenuons (mechanics). These 
are scored with a 6-point scale, 6 being superior and 1 
being seriously inadequate. This scale replaced the old 
pass/fail scale. 



ERIC 



Both speakers noted that one of the many merits of 
this new competency test is that readers can more easily 
score each essay according to the criteria. Moreover, the 
new system is fairer than the old one. Under the old 
system, whenever there was a split in "failing" or 
"pai;sing" decisions, a 3rd reader was consulted. Under the 
new system, each essay receives four readings, and readers 
do not know whether they are the 3rd or 4th reader; thus, a 
lot of political heat is removed. Readers also provi '3 
students with an "analytic checklist," which infoims them 
of the criteria used, the weaknesses in their essays, and 
comments from the readers. 

Bosworth commented that the interrelater reliability 
of the new test is 92% (as opposed to 73% in the old test) 
and that more students have passed the new competency 
test than before. However, Holdstein pointed out that 
most questions in the new test tend to be too content- 
laden and that the scoring criteria are too heavily weighted 
toward content Nevertheless, both speakers noted that the 
new test has proven to be far more effective than the old 
one and has fostered faculty collaboration. 



WE bli) IT AND LIVED; A STATE 
UNIVERSITY GOES TO EXIT TESTING 

Speake rs: i-hyllis Liston, John Maihew, Linda 
Pelzer, Boll Slate University 

Introducer/ 

Recorder: Joyce Malek, University of Minnesota 

In Fall 1987, Ball Slate University (Muncie, 
Indiana) implemented exit testing for writing competency 
as a prerequisite for graduation. The three member panel 
responsible for establishing the rubrics and coordinating 
the testing and holistic grading discussed what they learned 
during this first year. Participants were given hands-on 
experience with the exam by writing briefly in response to 
a sample ^vriting test essay assignment and discussing the 
process we went through to begin answering the essay 
question. They then ranked actual essays and were led 



through the process the panel uses to develop the rubric. 

Phyllis Liston began by describing what the exam 
coordinators learned in the process: (1) implementing, 
coordinating and gaining comn.unity-wide acceptance for 
exit exams is "a lot harder than it looks*"; (2) 
communication at all levels is essential; (3) low-level 
mistakes can cause high-level difficulties; money when 
needed is found; and (4) holistic grading works well. In 
addition, the exam needs full administrative and faculty 
support As the director of the writing competency exam, 
L jston found the administrative duties to be a full-time 
responsibility requiring personnel assistance. 

Liston explained Ball State's "3/3/3" exam process. 
Siudents sign up for ihe exam three weeks before the 
exam date and are given an instruction sheet detailing the 
exam process, the exam question, how to prepare for the 
exam, and where to go to receive help preparing for the 
exam. On the exam date, students are given three hours to 
write ^proximately three pages in response to the exam 
question. Students must pass the test to graduate. After 
two attempts, they are required to enroll in-and repeat 
until they pass-an upper division writing course. The 
second opportunity to take the exam constitutes an 
automatic appeal. Exit from the course is by portfolio 
prepare by the students with the help of their instructors. 
Portfolios are evaluated by two or more readers other than 
the classroom teacher/coach. No student takes the exam 
more than twice. 

John Mathew explained the trai«iing process for 
holistic graders by taking participants through a mini 
grading workshop. We read and ranked three sample 
essays high, middle and low. Then we read, ranked and 
integrated into the previous essays three more, and did the 
same for two additional essays. We then discussed our 
ranking of one of the essays in terms of its strengths and 
weaknesses. Finally we were presented with the six-point 
rubric developed by the panel for the particular exam and 
were asked to rate the essay. 

In an actual reading, graders read ten papers at a time, 
assess, record and score, and pass the papers on to a second 
reader. Papers with scores that do not match are given to 
a third reader. All pass decisions are made by the 
University Provost under the advisement of the panel and 
other administrators after all exams for the quarter have 
been scored. The panel acknowledges a high reader 
calibration and suggests a main reason for it is that readers 
do not know the cut-off point for failing, and therefore arc 
more objective and not sympathetically influenced to pass 
a border-line paper. 

Linda Pel/er described the lubric design process. The 
panel develops a new rubric for each exam by reading and 
sorting all essays written for the exam into high, middle 
and low categories. /\fter sorting, they discuss the 
categories and write about them, and they draft a six-pomt 
rubric-one that is quite detailed and descriptive and that 
includes spcciHc examples from student papers to 

O 

ERLC 



illustrate the rubric's categories. A six-point rubric is 
used because it eliminates a middle score and because a 
four-point rubric would not be specific enough to 
encompass the aspects of the writing they wish to assess. 
The panel takes care and time in designing the rubric to 
make it clear and specific in order for readers to reach 
consensus and to withstand criticism from students, 
parents and faculty. Rubrics are kept on file at the 
University library. One indicator of the success of the 
rubric is that students who fail the exam and wish to 
contest it usually reach agreement after examining the 
rubric and evaluating their own writing against it. 

Although the writing competency examination 
project is bigger than the panel first anticipated, they agree 
that it is worth the work. 



PROFICIENCY TESTING: ISSUES AND 
MODELS 

Speakers: George Gadda, University of Califomia, 
Los Angeles 

Mary Fowles, Educational Testing 
Service, New Jersey 

Introducer/ 

Recorder: Adele Hansen, University of Minnesota 

George Gadda opened the discussion with a statement 
conceming general issues in developing a proficiency 
testing program. Proficiency testing, like achievement 
testing, measures success in a particular domain. There 
are several motivations for proficiency testing: to certify 
individual achievement exclusive of grades, to validate a 
program's effectiveness, or to screen before certification of 
passing to the next level of insuuction. The choice of 
purpose govems the rest of the assessment program. 
Proficiency tests may be used to exempt students from 
further work; to prove value added in a course program; to 
permit passage* graduation or certification; or to identify 
those who need further mstruction. 

Gadda noted that test-makers should define the 
domain of the test by describing the kind of written ability 
being assessed and that we should make a public statement 
conceming the criteria used for judgment Tests used for 
advaicement should be a well-defined part of the 
curriculum, with samples and grading criteria clearly 
described. Ideally, scorers should be those people who are 
testing and using the results. In addition, we need to 
detennine what will happen to those who don't pass. 
Gadda noted that proficiency tests should not be a 
"roadblock." He concluded by staung that we should 
strive for high reliability and validity tn our testing 
because proficiency tests need to withstand legal 
challenges. 

Mary Fowles remarked that wc need an increased 
understanding of what is to be tested and that the 
"community" must share the same standards. She referred 
to a project in Rhode Island, where a state administrator 

C 



decided to wcmIc on literacy beginning in the third grade. 
ETS was asked to construct a test that encouraged good 
writing. They worked v. ilh local administrators and 
teachers from every school district in the state to 
formulate a writing test which was administered to all 3rd 
graders. The test featured a pre-writing section and then an 
essay test. It also included an editing phase, where 
students were given specific questions about content. 

Fowles described how scorers were trained* Fvcry 
district in the state was represented in training sessions, 
and benchmark papers were identified and then used to 
train local raters. After the results were tabulated, the 
teachers relumed to the classroom and showed examples of 
good papers to the students and discussed the scoring 
criteria. Next, the state decided to develop a portfolio of 
such "assignments" to validate the scores on the "test" and 
to enhance teaching. 

In the discussion that followed, questions were raised 
concerning the "read and respond" type of test. Gadda 
agreed that such a test docs assess reading as well as 
writing, but that there is a connection and such tests are 
useful to determine the students basic ability to do 
university level work. :\c added that such tests seem most 
fair, because all students begin with the same information 
and the students can then better understand the testing 
situation. He cautioned that such tests should always be 
pre-trstcd to discover if the reading is "accessible and 
mtcrcsting" and if the assignment elicits more than one 
response, because this can affect raters' evaluations. 



CREATING, DEVFLOPING, AND 
EVALUATING A COLLEGE-WIDE WRITING 
ASSESSMENT PROGRAM 

Speaker: Suzy Groden, University of 

Massachusetts, Boston 

Introducer/ 

Recorder: Geoffrey Sire. University of Minnesota 

In this session, Suzy Groden reported on the 
University of Massachusetts' writing assessment program, 
on-going since 1978. She described how it was 
developed, changed, ar.d validated. The exam is a "rising 
junior exam," required of students after 68 credits (or 
wiihm first semester '^or transfer students). Called a test 
of writing proficiency, the exam really tests reading, 
writing, and critical thinking because students have to 
respond to questions on texts (or "reading sets") with 
which they are provided one month prior to the exam. 
Students are cither judged proficient or must remediate 
their writing skills. 

The idea behind the lest is to leach students what the 
facuhy want llicm to know in various core curriculum 
courses, courses designed to include elements of critical 
analysis and reading/writing a^sxKiatal with that 
discipline. Each reading scl is 20 pages and concerns a 
conu-ovcrsial topic assiKiatal with a spcx ific discipline. 

O 

ERIC 



Students chose one of three sets, ftom natural sciences and 
mathematics, the social science, or the humanities, and 
Ihcy read the set for a month After the first exam was 
given in 1978, a sample for students became available and 
a student manual was developed. 

Groden stated that one problem in the exam is the 
lack of a penalty for those who fail. The exam is graded 
by readers who are trained in one morning and then read 
exams all afternoon. A student needs two readcis to agree 
in order to pass, and three readers to agree in order to fail. 
But there are actually no jH^tical penalties now associated 
with failing the exam: students can still take upper- 
division courses if they fail, and there is now an alternate 
way to demonstrate proficiency-a portfolio. 

During the course of subsequent years, changes 
occurred in the context of the exam. After Groden and the 
university's ESL Director became involved, an interest in 
writing and the acquisition of language found its way into 
the readings. Policies surrounding the implementation of 
the test were gradually loosened. The use of the portfolio 
alternative was extended, particularly to ESL students. 
Also, the range of writing samples included in the 
portfolio was expanded to include more than just the 
traditional analytical paper: lab reports, for example, 
would he accepted. Students were allowed three hours to 
write the exam, rather than just two. And in one of her 
more striking findings, Groden found students wrote much 
more easily when they switched from the standard-size 
blue books to the larger, 8- and- 1/2 inch size (that being 
the standard in which they most frequently composed). 
The exam committee also spent more time thinking about 
readings and questions; the exams be^came more 
complicated, involving ideas about the nature of 
knowledge. What ultimately evolved were two possible 
questions, one for the non-intellectual and one for the 
more challenging intellect. Finally, they also offered an 
evening session for taking the exam. 

There were also many changes over the years which 
Groden termed losses. Faculty involvement waned, with 
more and more responsibility for grading tailing to the 
exam committee. The school changed, taking in fewer 
freshmen and more transfer students, with the exam 
becoming a kind of graduation test Funding dried up, 
causing the university to retreat from its core curriculum 
and limit the number of its core courses, and, hence, 
severing the relationship between the curriculum and 
writing proficiency. 

One area in which the Massachur'^its exam 
developers were successful was n establishing grading 
criteria. Sending exam samples to national experts, 
Groden received strong validation and agreement on theu* 
criteria. The experts, however, were critical of the number 
of questions, feeling they made different cognitive 
demands and were unfair. The commilice is continuing to 
revise the exam. 



0 ^ 



HOW TO ORGANIZE A CROSS-CURRICILA 
WRITING ASSESSMENT PROGRAM 

Speakers: Gail Hughes-Wiener^ Susan Jensen 
CekallCy Gerald Martin , Mary 
Thornton-Phillips, Minnesota 
Community College System 

Introducer/ 

Recorder: Julienne Prineas. University of 
Minnesota 

The speakers began the session by describing how, 
with the support of a Bush Foundation Grant, the 
Minnesota Community College System (comprised of 18 
two-year college scattered across the state) has been 
engaged in a thiiee-year project to assess the effects of their 
Writing Across the Curriculum (WAC) program on 
faculty and students, especially on student learning. The 
four speakers described their separate but overlapping roles 
in the project, with a view to communicating the 
complexity of implementing this type of project Mary 
Thornton-Phillips* role has been to design and establish 
the broad structure of the project Susan Jensen-Cekalla, 
as the WAC Program Coordinator, has served as a bridge 
between the evaluation project and the faculty out int he 
colleges. Gail Hughes-Wiener's role as Evaluation 
Coordinator is to ensure that all of the components of the 
evaluation- such as surveys, interviews, essay exams and 
such-are designed, coordinated, implemented, analyzed and 
communicated. Gerald Martin, as research analyst, is in 
charge of processing the data. 

HughcS'Wiener pointed out the need to budget for an 
immense amount of administrative, interpersonal, and 
program development required prior to an) actual data 
analysis or report writing. Her experience has been that 
no one, including consultants prominent in the field of 
program evaluation, anticipated the amount of work and 
time needed for this preparatory work. The scope of the 
project demonstrates its complexity: data must be 
collected on faculty attitudes, student attitudes, and student 
learning. The project required the careful development of 
questionnaires and surveys, the effective training of 
interviewers, and the preparation of holistic scoring terms. 
In addition, the project leader had to build credibility and 
uiist among program participants and become 
knowledgeable about all needed information. 

Thornton-Phillips commented that ^heir progress has 
been aided by a clear sense of direction, despite uncertainty 
as to how to achieve their goals. Allowing for flexibility 
and change within a general framework has proved 
necessary and productive. For example, faculty 
involvement was essential to the success of the project. 
Faculty had to become trained, knowledgeable participants 
who understood the research and their role in it. Thus, 
Thornton-Phillips' first challenge was to as.scss the needs 
and interests of faculty in an attempt to generate strong 
stalT commitment and to develop a core faculty able to 



provide leadership for the program. The task was hindered 
by the voluntary nature of : taff development in the 
Community College System and by the tendency to cut 
funds for such development during budget crises. 
Thornton-Phillips found the catalyst for the change needed 
in a dedicated Joint Faculty/Administrative Staff 
Development Committee and in a small group of faculty 
who had worked together for several years on 
implementing "Writing Across the Curriculum." Jensen* 
Cekalla joined the team as program coordinator, leaving 
Thomton-Phillips free to woiic on budgeting, staffing, and 
scheduling aspects. Together, they refined the assessment 
comnonent and develq)ed reliable approach to reassure 
faculty. 

In her role as the most direct connector between 
faculty and the evaluation project, Jensen-Cekalla's 
foremost concern has been that all participants work 
together and coordinate their efforts. A cornerstone of the 
project is a summer workshop, which brings together 
faculty from all eighteen colleges. Follow-up meetings 
during the year provide the support and opportunity for 
exchange of information needed to maintain a united WAC 
teaching approach, and the grant provides all teachers with 
funds for a variety of supportive measures such as tutors, 
materials and supplies for the Learning Centers, and 
outside and in-house consulting. Jensen-Cekalla has also 
had to organize the data flow of information and resources 
from the evaluation out to people in the colleges. 

Martin's roles have included data analyst, data 
processor, in-house statistical research consultant, and 
resident skeptic. With the project now in its fourth year, 
the time has arrived to renew the reseaich grant and inform 
the graining igency of the progress made. Martin noted 
that a project of this type raises many issues along the 
way. Its original purpose was to look at student 
outcomes, such as specific changes in writing proficiency 
and the learning of subject matter. However, several other 
desirable outcomes not in the original proposal have 
become obvious. Hughes-Wiener noted that they have 
learned, for example, that faculty enthusiasm for WAC 
can be generated and that inroads into the organizations at 
both campus and administration levels can be made. 



PRESENTING A UNIFIED FRONT IN A 
UNIVERSITY WRITING AND TESTING PROGRAM 

Speakers: Lana Silverthorne, University of South 
Alabama 

Patricia Stephens, University of South 
Alabama 

Introducer/ 

Recorder: Gail A. Koch, University of Minnesota 

How can we foster ir stitutional consensus about 
undergraduate writing in a university? Lana Silverthome 
and Patricia Stephens answered this question by focusing 
on university-wide participation and dialogue. They 
described a multilateral commitment to undergraduate 



ERIC 



10 



writing thai has grown incrementally over ihe past seven 
years at the University of Souih Alabama. The nrimary 
agent of this progress has been the continuous 
participation of faculty from various disciplines, 
especially in the construction of an upper*level writing 
across the curriculum (WAQ program . 

The impetus for ongoing development of Ihe upper- 
level WAC program has been a week-long summer 
seminar for faculty across the disciplines, includisig one 
representative from each of the undergraduate deparunents. 
It has been repeated annually since 1981. The seminar 
work is guided by Director of the University Writing 
Program and by an outside consultant The participants 
write and talk a^ut the purpose of writing in their junior 
and senior courses. They get acquainted with the practice 
of continuous "writir.g-to-lcam" and with its potential 
uses in their courses. They put together a proposal for a 
sequence of "wri:inyio-leam" assignments to be tried and 
revised in their own courses over several quarters, and they 
review each others' WAC proposals. They learn ways of 
responding to students* efforts to "write-to-leam." 

According to Silverthome, the WAC seminar, 
first conceived as a means to convert, has by now become 
a forum for faculty leadc ,ip. Participants become the 
teachers of upper-level content courses designated as 
writing courses. By now, at least haif of the faculty are 
teaching such courses. (Students are lOw required to take 
two such cour is, one in their major, and there are now 
about 70 such courses available each quarter.) WAC- 
cxperienced faculty influence the criteria by which a 
content course can be designated as a writing course. 
They give precedence to continuous writing in content 
courses over production of the "one-shot" term paper, and 
they sariCtion "discovery" writing which encourages 
students to ' bring their own experiences to bear upon 
subject mailer." 

Silverthome noted thai holistic assessment of essays 
composed by U-ansfcr students who have had Freshman 
English elsewhere has provided a second opportunity for 
building consensus at the University of South Alabama. 
Piloted in 1983, the test has recently become a 
requirement. The lest prompt mirrors the emphasis on 
personal writing in the University's first quarter of lower- 
level composition and on the exit test given at the end of 
this first quarter of writing. Students are given a choice of 
three prompts. They have tv o hours to write with 
dictionaries and handbodcs. Students are informed of the 
general criteria by which their essays will be judged. Each 
paper gets three readings, and the evaluation determines 
whether or not a tested transfer student starts in the first of 
the University's writing courses. Since 1983, about 75 
percei.t of the students have passed the tes» The transfer 
test essays are 2 >sessed by cadre of faculty readers from 
various disciplines who teach the upper-level content 
courses designated as writing courses. Their decision is to 
pass or fail an essay. If an "ssay arouses irresolvable 
ambiguity in one reader, it is passed on to two additional 
readers for the pass/fail decision. 

O 

ERLC 



Records on this assessment process bear out the 
claim of active university-wide participation of faculty. 
Between the fall of 1986, fifty-six faculty have served as 
readers, about 71% of ihcm from the professorial ranks. 
Their distribution by department or discipline shows 
variety: 7% Business; 34% English; 9% Humanities and 
the Arts, 18% Medical Sciences and Nursing; 14% Natural 
Sciences and Engineering; 18% Social Sciences and 
Education. The records also show high inter-reader 
agreement. Figures over twelve quarters between the fall 
of 1983 and the fall of 1987 show the average rate of 
agreement to be 87.2% in the first year. The local reading 
was tested against thejudgment of external readers. With 
the help of Jie NTNW, a study was conducted ?o compare 
the assessments of five local readers to that of three readers 
at CUNY. The rale of agreement between he two groups 
overall was nearly 80%. 

Patricia Stephens took up the matter of the reasons 
for the high degree of consensus in tnis assessment 
process. She cited the quality of the WAC seminars, the 
credibility of the program director,and administrative 
sui^rt and incentives.Faculty who are developing a new 
upper-level writing-designated content course are released 
from teaching one course, and the enrollment in their 
writing-designated content course is reduced to 25. A 
participant in the week-long WAC seminar is paid $400; a 
reader for the transfer lest essay who, on an average, 
judges 35-40 papers, receives an honorarium of $50. 
Stephens stressed the importance of the faculty's common 
concent for students' development as effective writers, 
underscoring Silverlhome's contention that drawing upon 
faculty from various disciplines creates a university-wide 
sense of responsibility for the quality of students' writing 
and fosters a continuing university-wide dialogue about 
writing standards. 

The continuing dialogue Is crucial. Stephens 
described "calibration sessions". In these sessions, 
readers consider their common purpose of helping students 
to improve their writing and discuss the genera! criteria or 
qualities by which they decide to pass or fail a test essay 
in relation to this common goal. There are four qualities, 
a number kept small on purpose, to head off a 
penchani"read for everything we know in our various 
disciplines." The naming of the criteria, too, is kept 
simple and uiie to the holistic assessment principle of 
reading for general impression: Invention (Has the writer 
of the essay been thoughtful, reflective, candid?) 
Arrangement (Has the writer achieved wholeness, made a 
piece of it?) Development (Has the writer recognized and 
fieshed out the point of the essay, giving it credibility and 
validity?) Stvle (Does the essay have clarity, give evidence 
of the writer's own voice, the writer's own crafting, and 
editing?). 

The dialogue amongst faculty continues through 
instructional use of carefully kept records. Results of 
inter-reader reliability and validity studies are shared with 
readers to help them evaluate their own reading 
performance in relation to that of the others. Readers are 

H 
X 



given detailed infcxmation about the lesults of their own 
decisions, a statistical summaiy of each reading session, 
and a cumulative summaiy of all reading sessions. In 
addition, the leaders are rated and dieir ranking ftpofted 
them. They are rated on three bases: experience, 
reliability, and validity (or the fit between their judgments 
and other infoimaiion about students such as CPAs ^d 
ACT scores). In shoft, readers have r^lar, informed 
opportunities to reflect upon the relative fit of their 
judgment with the consensus. 

One last piece of information about the consensus 
rqxxied by Silverthome and Stephens is that the 
nKmbership of the transfer-test reading group is stable, 
the chief movement being the addition each year of two 
new members £rum the summer WAC seminar. Once 
having assumed the n4e, very few have ever rqxidiated it 
Stephens pointed out that it is in the faculty's interest to 
beinv(dved: reading the lest essays serves as a usefid 
means by whxh faculty who teach writing-designated 
junior and senior courses can gauge students' readiness to 
deal with the "writing-to-kam" orientation of their 
courses. 



EVALUATING A LITERACY ACROSS THE 
CURRICULUM PROGRAM: DESIGNING AN 
APPROPRIATE INSTRUMENT 

Speaker: Linda Shohet, Dawson College, 
Montreal 

Introducer/ 

Recorder: L. Lee Forsberg, University of 
Minnesota 

Linda Shohet, directCMT of the Literacy Across the 
Curriculum Center at Dawson College, has taught 
Canadian literature and writing at Dawson since 1973. 
She began developing the Literacy Across the Curriculum 
program in 1984; the Center now provides instructional 
and consultation services to English high schools and 
colleges throughout the province. She began the seesion 
by reviewing the language-related political issues in 
Quebec, and then she sketched the development of the 
center and discussed the evaluation of the program 
scheduled this spring at Dawson. 

Quebec is a unilingual province in a bilingual 
country. French (first language) speakers comprise 24.6 
percent of the population of Canada and 83.5 percent of 
the population in Quebec. English (First language) 
speakers comprise 68.2 percent of the population in 
Canada and 12.7 percent of the population in Quebec. 
French speakers see the maintenance of their language 
politically, as the survival of their culture. Consequently, 
language awareness is high. 

Dawsc College is a two-year, English language 
community college; all students going on for a University 
degree must first comfdete community college. The 
Literacy Across the Curriculum program was initiated by 

ERIC 



the faculty develq)inent committee, not the English 
Department, and its administration remains in the faculty 
devel(q)ment office. Keq)ing it out of the English 
dq)aitment gives the program a broader base of support 
and institutional commitment, Shohet said. The program, 
originally intended as internal, soon started receiving 
requests from English-language high schools and other 
colleges, asking for ideas and resources. As the program 
expanded to meet those needs, costs rose. The only source 
of additional funding was the govemment, which required 
evidence that the program was relevant to the entire 
community, including French-language schools and 
colleges. Th:^ Dawson College administration had ordered 
an evaluation of the program to show its value to its cwrn 
faculty before supporting expansion. 

The bureaucratic demand, Shohet said, is to ask how 
much student literacy has increased as a result of a 
program; her response consisted of showing how faculty 
have responded and how classroom s^cdvities have 
changed. She also commented that the outcomes of a 
literacy across the cunkulum prpgram are not limited to 
reading and writing insmiction. At Dawson, faculty 
members began attending more faculty development 
seminars, interacting across departments, and volunteering 
to develop classroom projects (when previously, they had 
been embarrassed to be seen at a writing workshop). 
Faculty publications also increased. 

The design of an evaluation instrument began with a 
model developed at San Diego State Cbllege, which 
helped define an evaluation that would be particular to 
Dawson. The college also employed an outside 
consultant, Shohet noted, which gave the evaluation 
additional objectivity. She cautioned against using 
generic evaluation instruments; each program develops 
with its own goals and philosophy, and must be evaluated 
on that basis. 

The Dawson evaluation focused on particular 
questions ^ut classroom activities before and after 
workshop attendance: class time spent writing; time 
spent talking about writing; writing assignments; use of 
Journals; woridng with drafts; oral communication 
assignments; use of library resources. In categories 
covering writing, reading, speaking, and listening skills, 
the evaluation attempted to determine what changes 
instructors had made in their classes, what goals they had 
for center prp^p^s, and whether participation in center 
programs had promoted educational exchange with other 
faculty members or increased levels of theoretical 
knowledge. One section defines the program's objectives 
and asks faculty to evaluate objectives as appropriate and 
applicable; this type of inquiry not only helps refine 
program goals for future planning, but reinforces 
awareness of program objectives among faculty members 
who respond, Shohet said. 

The center ran a pilot study, distributing evaluation 
forms to ISO randomly selected faculty members, chosen 
from those who had attended workshops. About 100 
responses were retumed; some questions were refined. The 



evaluation in its flnal form will be distributed to faculty 
members across the curriculum this spring. Shohet will 
report on the results at next year s NTNW Conference, 
which will be held at Dawson College. (Shohet is the 
Conference Co-Coordinator.) 



VALIDITY ISSUES IN DIRECT WRITING 
ASSESSMENT 

Speakers: Karen Greenberg, NTNW and Hunter 
College, CUNY 

Stephen Wine, Stanford University 

Introducer/ 

Recorder: Joanne Van Oorsouw, College of Sl 
Catherine, Minnesota 

Karen Greenberg began with what she deemed a 
ladical statement: "I have examined more than 600 
writing tests and have yet to see one that I would conside- 
to b^, a valid one." She went on to state that it seems 
impossible for writing tests, with ihcir narrow subjects, 
implausible audiences and severely restricted time frames, 
to refle<:t the natural processes of writing in either 
academic or personal contexts. 

Greenberg explained her position by pointing out 
that writing consists of the ability to discover what one 
wishes to say and to convey one's message through 
language, content, syntax and usage that are £q}propriatc 
lu. one's audi( ice cjid purpose. In light of this, she said, 
it is particularly distressing to note that teachers at many 
institutions find themselves administering tests that bear 
little resemblance to this definition or to their curricula 
and pedagogy. For example, many schools still use 
multiple-choice tests of writing even though this type of 
t'^ting does not elicit the cognitive and linguistic skills 
mvolved in writing. 

She stated that writing sample tests, on the other 
hand, can assess writing capacities that cannot be 
measured by existing multiple-choice tests. They, 
however, also, have flaws, and many problems result from 
our reliance on single-sample writing tests for placement 
and p-oficiency decisions. She warned that a single 
writing sample can never reflect a stude: ability to write 
on another occasion or in a different mode. Yet, according 
to surveys conducted by NTNW and CCCC, thousands of 
schools across the country continue to assume that 
"writing ability" ii stable across different writing tasks 
and contexts and continue to use a single piece of writing 
as the<r sole assessment instrument. 

Greenberg then went c to suggest what those 
involved in large-scale scale direct assessment of writing 
should do about validity. The first step in establishing a 
tcof s validity is to determine its purpose: what 

erJc 1 



information is needed by which people and for what 
purposes? The next step is developing a clear definition 
^f the writing competence that is being assessed, one that 
will vary according to the purpose and context of the 
assessment. Developing this definition is a critical step 
in creating a valid assessment, but it is easier said than 
done for there is as yet no adequate model of the various 
factors that contribute to effective writing in different 
contexts. Finally, after coming to agreement on their 
definition of writing competence, faculty need to establish 
'consensus about the writing tasks that are significant in 
particular functional contexts. 

Greenberg noUrd Uiat slie deliberately chose to talk 
about faculty rather than test developers, for she believes 
that the people who teach writing should be the ones who 
develop the assessment instruments. Faculty need to 
work together to develop tests, to shape an exam they 
believ^ in so that they can be sure its principles infuse 
curr culum and classroom practice. Even when faculty 
A^ork together, however, Greenberg said that definitions of 
competent writing may vary dramatically. Locally- 
developed essay tests show incredible variability in the 
skills measured, due to difl'erence in the range of skills 
asses.^ aiKl the criteria used to judge those skills. For 
example, faculty often differ about the range of discourse 
structuies that they should teach and that a test should 
assess. One way to sample students' ability to write 
different types of discourse is to use the portfolio method, 
in which writers select three or four different types of 
drafts and revisions for evaluation. This kind of 
assessment reflects a pedagogy that emphasizes process 
over short, unrevised products. Thus, this kind of test 
stimulates; writing teachers and programs to pay more 
attention to the craft of coraposing. 

Greenberg's final point was phrased as a question: 
What is the relationship between what we teach and what 
we test? We cannot, and should not, separate testing from 
teaching, and we as a profession must be more concerned 
with the validity of both of these efforts. 

Steve Witte summarized a study begun in 1982 
which sought to answer two research questions: (1) Do 
>aTiting prompts that elicit different types of writing and 
that elicit written texts of the same quality cause writers 
to orchestrate composing in different ways? and (2) Do 
comparable prompts that elicit the same type of writing 
and elicit written texts of the same quality cause writCTS to 
orchestrate composing in different ways? Witte stated that 
although this study did not investigate naturally occurring 
discourse, this type of experimental study can inform the 
kinds of conceptualizations we can make beyond the 
experimental study. 

The first step in conducting this research was to 
create two comparable writing tasks of two types: 

13 



expository and persuasive Pirompcs were created after 
consultation with students, writing teachers, high school 
tr^achers, and ^we-senricehi^ school teachers. Those 
prompts found to be comparable by these gioups were 
then |)fetested and were found to elicit comparable ranges 
of writing quality. The subjects were 40 volunteer college 
freshmen at the University of Texas who were randomly 
assigned to one of the four tasks. Think-aloud protocols 
and rough drafts were collected and analyzed acoxding to a 
coding scheme devek)ped by the e?q)erimenter$. The 
results of a multivariate ANOVA showed that 16 variables 
distinguished between the persuasive and expository tasks; 
these variaUes included generating ideas, setting content 
goals, reviewing text Writers tended to set mort content 
goals and generate more ideas for the expository tasks and 
set more rhetorical goals for the persuasive tasks. A 
discriminant analysis was done to determine which 
variables distinguished among all four tasks. Eleven 
variaUes were found to do this. 

Witte stated that findings indicate that writers engage 
in different kinds of processes for different kinds of tasks. 
In terms of writing assessment, each prompt we use to 
assess ability will be measuring dUkreni dimensions of 
that ability. The obvious conclusion, then, is that thm 
is no way to assess writing ability with c '/otit task or 
prompt We do not yet know how many prompts or tasks 
might be needed. Witte also noted that this study should 
make us question models or the writing process that are 
based on protocols from just one task. More research of 
the type presented here-studies that examine the effects of 
context on process-are needed. In Witters study, context 
was limited to the writing prompt, a part of the context 
important to writing assessment He said that we need 
more research that will help us identify how writing 
processes are circumscribed by other aspects of context. 



conditions. Since we can never determine a students true 
score on a test, we need to calculate the test's standard 
error of measurement (a statistical estimation of the 
standard deviation that would be obtained for a series of 
measurements of the same student on the same test). 
White pointed out that because of the error in all 
measurement, no single score is reliable enough to be 
used as the sole determinant di any particular ability or 
skill. 

Next, White explained the problems in e^y test 
reliability. He compared the reliabilities of holistic 
scoring, analytic scoring, and multiple-choice scoring; and 
he discussed the difference between inter-rater reliability 
(agreement between different raters) and intra-rater 
reliability (agreement of a rater with him/herself at 
diffierent points in time). White commented that rater 
disagreements over the quality of holistically-scored essays 
do not constitute "errors." The traditional psychometric 
paradigm of reliability cannot help us with a phenomenon 
such as subjective judgment, which may be better 
determined through rater disaifffiCIDfiQtS rather than through 
their agreements. This led White to a discussion of 
"generalizability theory" and its implications for the 
reliability of essay test scores. He noted t our goal 
should be a reduction in the number of rati jagreements 
of more than two scale points (these should occur no more 
than S% of the time in any scoring session). 

White ended with suggestions for increasing the 
reliability of essay testing* Essay test administrators 
should reduce the sources of variability in test contexts 
(by controlling as many variables as possible), should 
keep the scoring criteria constant, should pre-test and 
control test prompts, should control essay reading and 
scoring procedures, and should always try to use multiple 
measures to assess students' skills. 



RELIABIUTY REVISITED: HOW MEANINGFUL 
ARE ESSAY SCORES? 

Speaker: Edward White, California State 
University, San Bernardino 

Introducer/ 

Recorder: Karen Greenberg, NTOW and CUNY 

Ed White began the session by offering a clear 
definition of reliability: it is the consistency of 
measurement over different test situations and contexts. 
He explained the various types of reliability and d^^^cussed 
their origins in agricultural research. He briefly r scussed 
validity in educational research and noted that reliability is 
"the upper limit for validity" (i.e., no test can be any 
more valid '"lan it is reliaUe). 

Next, White discussed "true scores," the "standard 
error of measurement,'' and uncertainty in measurement 
The true score of a test is a Platonic ideal-it is the mean 
score of repeated attempts at the test under identical 

1 



ESTABLISHING ANP MAINTAINING SCORE 
SCALE STABILITY AND READING RELIABILITY 

Speakers: Wayne Patience, GED Testing Service 
Joan Auchter, GED Testing Service 

Introducer/ 

Recorder: Anne Aronson, University of Minnesota 

Wayne Patience and Joan Auchter presented die 
procedures used by the GenenJ Education Development 
Testing Service (GEDTS) to evaluate essay exams required 
as part of die GED Test for individuals seeking high 
school equivalency diplomas. They described and 
illustrated the methods employed by GEDTS to establish 
and maintain stability or consistency of scoring, and 
reliability among readers, despite the decentralized nature 
of their evaluation program. 

Patience explained that the notion of equivalency 
derives from: (1) defming die content of the GED Tests 
so as to reflect the community expected outcomes of 
completing a traditional *iigh school program of study and 




(2) deflning passing sC(»es relative to the actual 
demonstrated performance of contemporary graduating 
seniors. Only those examinees who receive scores that are 
better than 30% of high school seniors art awarded the 
diploma. The job of the GED staff is rather to describe 
the skills and content knowledge that characterize the work 
of high school seniors than to prescribe levels of 
achievement 

The recent addition of an essay exam to the Writing 
Skills Test created questions about how to establish and 
maintain reliability. The GEDTS's first activity was to 
develq} a scoring scale that would have the same criteria 
regardless of time or place. By administering an essay 
exam to thousands of high school seniors, sorting those 
essays into six stacks, and describing the characteristics of 
each stack, the Writing Committee of GEDTS was able to 
develop a holistic somng scale that has been used 
successfully in hundreds of sites nationwide. 

Auchter then reported on how GEDTS insures 
stability and reliability in the use of the scoring guide. A 
permanent GEDTS Writing Committee, consisting of 
practicing language aits professionals, selects the topio^ 
and the papers that are used in training, c^fying, and 
monitoring site trainers and readers. The Writing 
Committee chooses and tests "expository" topics that do 
not require students to have any special knowledge or 
experience. The next step is for GEDTS to train and 
certify Chief Readers who are responsible for insuring that 
the GED scoring standards are applied uniformly. During 
the 2 1/2 day training. Chief Readers learn to overcome 
personal biases (e.g., responses to handwriting) that may 
influence scoring, and to use the language of the scoring 
guide alone to describe and evaluate papers. Sets of 
training papers contain a range of papers for each point, to 
illustrate the fact that there is no "perfect" paper for each 
point, but that there is typically a distribution of high, 
medium, and low papers. Training packets also include 
problematic papers (e.g., a p^ written in the form of a 
rap song). Since the national average for high school 
essays scores is 3.25 and for GED scores is 2.7, training 
sets contain a disproportional number of 2, 3, and 4 
papers. After working with training papers, GEDTS 
trainees are required to evaluate several sets of papers to 
determine whether or not they are currently certifiable as 
CHiief Readers. 

The same training and certifying procedure is carried 
out at the various decentralized testing sites, with the 
Chief Readers responsible for training and certifying 
readers. Auchter noted that language arts teachers trained 
through this process feel better about teaching writing and 
about using holistic grading in the classroom. 

Further steps to insure sc(xe scale stability and 
reliability are site certification and monitoring. Each 
scoring site must demonstrate the ability to score essays 
in accord with the standards deflned by the GED Testing 
Service. Essays used for site certification must receive at 
least 80 percent agreement in scorinc; among Writing 



Committee members. Although some sites may achieve 
high inter-reader reliability, a site cannot pass certification 
unless it achieves at least 90 percent agreement with 
GEDTS essay scores (or 85 percent for a provisional 
pass). Three procedures are used to monitor testing sites: 
(1) the Chief Reader does third readings of discrqant 
scores and roxr 1 s each time a reader is off the standard; (2) 
re'vlers evaluate a set of 'recalibration" papers at the 
beisinning of each scoring session in order estaUish 
reliability for that day; and (3) GEDTS conducts site 
monitoring using the same procedures as are used in site 
certification. 



TRAINING OF ESSAY READERS: A PROCESS 
FOR FACULTY AND CURRICULUM 
DEVELOPMENT 

Speaker: Robert Christopher, Ramapo College, 
New Jersey 

Introducer/ 

Recorder: Mary Ellen Ashcroft, University of 
Minnesota 

Robert Christopher emphasized the imperative of 
assessment, pointing out that assessment has always been 
intrinsic to the classroom experience, but it has now 
become extrinsic. He noted that many faculty fear writing 
assessment efforts because they represent an intrusion on 
their methods for evaluating students. He stated that 
faculty fears can be countered by several arguments: 
assessment helps students, it facilitates faculty and 
institutional research, and it is a professional activity. 

Christopher went on to suggest ways of building 
faculty consensus for assessment In a training readers, he 
suggested starting with a loyal, supportive core. This 
group's primary responsibility would be the development 
of an instrument for assessment, a task which should take 
six months to a year. He suggested that good readers are 
people who are task oriented, are good collaborators, are 
preferably not new faculty members (who might not have 
a sense of writing at the institution), and who work with 
"all deliberate speed.*" Good readers must not be ""Matthew 
Arnolds" before whose standard everything fails. It is 
important, according to Christopher, that a large pool of 
readers be developed, so that a small loyal group will not 
wear out. 

The next step, according to Christopher, is 
conducting a reading to build consensus. *fhe initial 
reading should consist of 500 to 1000 essays, so that 
readers get a sense of the range of writing abilities of 
students at the institution. The reading must be conducted 
"blind" with each paper read and assessed twioe. Essays 
are identified as "strong," "weak," and "in-between"; 
readers discuss each essay and stowly evolve into an 
interpretive community. 

In terms of curricular implications, Christcq}her 
pointed out that placement assessment is easier to 



15 



accomplish and has been more fully studied than 
proficiency assessment He also noted that assessment 
can be used for students to kam '0 talk about their writing 
in small groups and in conferences, so that students leam 
to to better readers and editors. Assessnicntcanalsobe 
used to encourage collaborative or group teaciiing, said 
Christopher. As faculty members reUnquish some control 
to groiq) or collaborative situations in the assessment 
process, they leam from one another and share techniques 
and materials. 

In answer to conferees questions about develcqring 
the holistic process, Christopher suggested that the 
English Depaitment provide a core of expert readers which 
should eventually grow to become interdisciplinary. He 
noted that in any two-day reading of essays, there is 
always the need for reliability checking ("Let's all read this 
essay and make sure we're on track"). Finally Christopher 
pointed out that students benefit fnm holistic essay 
assessment because their writing skills are evaluated by a 
team of teachers. This kind of assessment program, 
Christopher says, v/oAs on behalf of students. 



DISCREPANCIES IN HOUSTIC EVALUATION 

Speakers: Donald Daiker, Miami University, Ohio 
Nedra Grogan, Miami University, Ohio 

Introducer/ 

Recorder: Sandra Flake, University of Minnesota 

Donald Daiker presentfii the goals of the sessions: 
to share the conclusions and a tentative evaluation of his 
and Nedra Grogan's examination of discrepancies in 
holistic e validation. Noting that discrepancies in holistic 
evaluation have been a pr€>blem from the beginning, he 
raised two questk>ns: What accounts for discrqyancies in 
holistic evaluation if the '*quiiky*| reader is ruled out? And 
is there such a thing as a discrepant essay? 

Daiker and Grogan soi^ht to answer these questions 
using an annual holistic grading session for Miami 
University's Early English Composition Assessment 
Program (EECAP), a program in which 10,000 essays 
written by high school juniors in a controlled setting are 
evaluated for diagnostic purposes. The setting was one in 
which students, using a prompt, wrote for 35 minutes in a 
high school composition class. The time limitation was 
dictated by the constraints of a single class p^od. The 
goal of the holistic e valuat'on was essentially diagnostic, 
with a scoring scale of ) to 6. Grades of S or 6 indicated 
clearly above average papers demonstrating strengths in all 
of the rating criteria Grades of 3 or 4 indicated papers 
ranging from slightly belcw to slightly above av^ge, 
with combined strengths and weaknesses in the criteria or 
under development And grades 1 or 2 indicated clearly 
below average papers tailing to demonstrate competence in 
several of the criteria, often because the paper was too 
short. A grade of 0 was used only for papers which were 
off the topic of the prompt E valuators gave each paper a 
single holistic rating, and additionally rated criteria in four 

ERIC 



categories (ideas, supporting details, unity and 
organizatkm, a^d style). 

The participating high schcol teachers (who were the 
evaluators) were trained through a piocess of rating and 
discussing sample p^)ers, so that the rating criteria would 
be internalized. Participants in the session were then 
provided with the writing assignment or prompt, the 
scoring scale, the rating criteria, a rater questionnaire, and 
one of the papers. 

To k)cate possible discrqiant papers, Daiker kxdced 
for three-point gaps in scoring by two evaluators and gave 
such pap^ to both a third and fourth evaluator. If those 
evaluatcvs also disagreed on the rating of the pi^, he 
klentified it as a potentially discrepant paper, lluicugh 
this process, four potentially discrepant ps4)ers were 
identified, and those four psq;)ers were given to all 61 of 
the evaluates in a session at the end of the second 
weekend of evaluation. Participants in our session then 
read and evaluated one of the potentially discrq)ant papers, 
using a rater questionnaire, scoring scale, and rating 
criteria. The rating of the participants were tabulated: 1 
peraon a«<'igned the die p^r a S, 16 assigned a S, 28 
assigned a 4, and 4 assign^ a 3. 

Following the participant evaluation and some 
discussion, Grogan presented the result of the evaluation 
by61 trained raters who rated the paper at the end of the 
second wedcend of evaluation, with 26 of the raters 
(42.6%) giving an upper range (S-6) rating, 34 of the 
raters (SS.8%) giving a middle range (3-4) rating, and 1 
(1.6%), giving a lower range (1-2) rating. 

Because of the clear division between the S-6 and the 
3-4 rating, Grogan and Daiker believe that die paper did 
qualify as a discrepant paper. Daika* reported that 
discussion following the rating by the trained evaluators 
suggested a correlation between the dqnh of emotional 
response to the paper and the highness of the score. 
Folk>wing some discussion about whether or not the 
paper was truly discrqiant a conferee asked whether the 
problem was really caused by discrepant readers who could 
not be objective because of the depth of their emotional 
response. Daiker argued that reader objectivity was more 
complicated issue and further argued that precisely because 
the paper provokes a range of responses to the emotional 
content, it could be deflned as a discrepant paper. 

The implications of evaluating discrepant paper were 
then summarized by Grogan, who raised the issue of the 
role of holistic evaluation of a single essay that receives 
discrepant scores. She concluded that in such cases a 
single essay should not determine the fate of the writer, 
and that an appeals process clearly needs to be a 
significant part of a holistic evaluation program. 
Discussion throughout the session focused on some of the 
limitations of holistic evaluation of writing produced 
under a time constraint, on problems in establishing clear 
criteria and scales, and on problems of reader objectivity. 



1G 



PROBLEMS AND SOLUTIONS IN USING OPEN- 
ENDED PRIMARY TRAIT SCORING 

Speaker: Michael C. Flanigan, University of 
Oklahoma 

Introducer/ 

Recorder: Chris Anson, University of Minnesota 

Michael Flanigan began by outlining his 
university's plan for five ears of experimental research on 
the teaching and testing of writing. Much of this research 
will replicate published research studies, but original 
research will also be conducted. All of the studies will be 
controlled exp^menlal studies so that the researchers can 
be fairly faithful to the original ones and can analyze any 
differences between the original and new research. 

Flanigan discussed one study, already completed, in 
which his colleague David Mair and he combined the 
strategies of two studies by George Hillocks, an 
experimental study involving teaching extended defmition 
using inquiry and models and a descriptive study dealing 
with "modes of instruction" (both of which Hillocks 
discusses in some detail in this book iv^ search on Written 
CompositionV In the replicated study at Oklahoma, all 
twenty classes consisted of university freshmen; for nine 
of the ten teachers it was their second semester of 
teaching, and ^proximately 500 students were involved. 

Flanigan pointed out he chose Hillocks* studies 
because both dealt with significant areas in teaching and 
writing. Extended definition represents a kind of discourse 
that permeates almost au thinking and writing. The 
researchers believe that by replicating such an important 
study they could get inside the problems of the earlier 
research, and come to understand it better. The 
experimental extended defmition study also used Hillock's 
open-ended fHimary trait scoring technique because the 
researchers wanted to leam to use and understand it better. 

After reporting the fmdings from a small sampling 
of the data, Flanigan described some problems that he and 
his colleagues faced as they aucmpted to use Hillocks' 
q)en-ended primary trait scoring system and he discussed 
the modifications they made in it to obtain reliable 
results. He pointed out that with an open-ended primary 
trait scoring scale theoretically there is almost no limit to 
what students can score. Most scoring scales range from 
1 to 6 (as in the h "^uc score for the ECT), 2 to 8 (as in 
CLEP), 1 to 5 ^ CORE scoring) and so forth. In 
open-ended pn\ lait scoring, the limit for a talented 
student is probab;y dictated by time and the variation and 
limitations imposed by the writing called for. In the 
papers scored in this stuJy. the top score was 28. 

The traits for wl ich students could receive scores 
w^re: (1) property putting an item in a class; (2) creating 
criteria for the class; (3) giving examples; and (4) 
providing contrastive examples to clarify and limit each 
criterion. Points were not given for differentiae as in 
Hillocks' original study; instead, class and differentiae 

O 

ERIC 



were combined (on the advice of Hillocks when the study 
was set up). Hillocks' scorers had had problems reaching 
agreement on this point Students could receive 2 points 
for the class, 2 for each criterion, 2 for each example, and 
2 for each contrastive example. Obviously the more 
criteria, examples and contrastive examples students could 
come up with, the higher their score. In initial training, 
scorers had problems staying close together in the higher 
ranges, so Flanigan modified his tolerance of acceptability 
by allowing scores in the range 1 to 10 to differ by 1 
point, 1 1 to 20 to differ by 2 points, and 21 to diff^ 
by 3 points. Scores within tl it range were averaged; 
scores th at did not meet acceptable standards were read by a 
third reader. If the third reading fell within range of either 
of the other two readers, then those scores were averaged. 
If there still was no agreement, a fourth and fifth reader 
scored the psq)er, and the p^)er and the range of scores 
were given to the researchers and a score was determined. 
For example, one paper was scored 6 and 8; a third reader 
gave it 10; the fourth read^ gave it 9, and the fifth reader 
gave it 7. Its final average was an 8. Only seven pq)ers 
required the fourth and fifth reader. Often, readers had 
problems keeping clearly in mind the kinds of criteria the 
writers were developing. To simplify the process, any 
one clear criterion could be accompanied by a number of 
examples aiid contrastive examples. If no criterion was 
given, only one example could be counted. If an 
undeveloped example or string of general examples was 
given, a score of 1 was given. 

Flanigan concluded that c^en-ended primary-trait 
scoring offers real promise, for it allows for a kind of 
differentiation that closed, limited systems do not 
However, researchers who use the system will probably 
have to modify it to get consistent, reliable scores. They 
will also have to plan their research so that the traits they 
are describing and scoring are clear, well-defmed, and fully 
conceptu;dized by their scores. The session ended with the 
speak?T giving participants six papers that had been scored 
by three readers and leading participants through a guided 
scoring session. 



THE IMPLICATIONS OF THE RHETORICAL 
DEMANDS OF COLLEGE WRITING FOR 
PLACEMENT 

Speaker: Kathryn Fitzgerald, University of Utah 
Introducer/ 

Recorder: Linda Jorn, University of Minnesc/') 

Kathryn Fitzgerald gave participants attending this session 
a chance to analyze student writing in terms of rhetorical 
evaluation criteria developed at the University of Utah. 
These criteria arc intended to do the following: 
(1) describe the rhetorical situation college students face 
when asked to write an essay that will be assessed and 
used to place them in a freshman writing course; (2) assert 
that the evaluation of the rhetorical situation provides 
valid criteria for placement of students into various levels 
of freshmen writing courses; and (3) shape the discussion 



THE SEVENTH ANNUAL 
CONFERENCE ON WRITING 
ASSESSMENT is a national conference 
designed to encourage the exchange of 
information about writing evaluation and 
assessment among elementary, secondary, and 
postsecondary administrators, teachers, and 
test developers through forums, panels, and 
workshops. 

Research on Writing Assessinent a Writing 
Assessment Across Cultures and Languages H 
Developing Essay Tasks B New Models of Scoring 
Essays B The Impact of Testing on Curricula and 
Pedagogy B Problems in Exit and Proficiency 
Testing B Computers and Writing Assessment B 
Writing Program Evaluation B National Standards 
for Writing B Portfolio Assessment 



Note: Springboards, Quebec's annual language 
arts conference, will take place April 13th and 
14th in Montreal. For information, call Fran Davis 
at (514) 484-7646. E.S.T 

PRE- ANB POST CONPEftENCfi 

WORKSHOPS: 

To provide in-depth exploration of selected 
issues, four workshops have been added to this 
year's program. Enrollment for each is 
limited, so register as soon as possible. The fee 
for each workshop is $40 US, $50 Canadian 
(which includes lunch, coffee breaks, and all 
materials). 

Pre-Conference: Saturday, April 8 

9:30 a.m. -4:00 p.m. 

A. Computers in Writing 

Leaders: He!en Schwartz & Michael 
Spitzer (30 participants) 

B. Large-Scale Writing Assessment 
Leader: Edward White 

(50 participants) 

Post-Conference: Wednesday, April 12 

9:30 a.m.-4:00 p.m. 
C Portfolio Assessment 

Leaders: John Dixon & Peter Elbow 

(50 participants) 
D. Writing Assessment Across Cultures 

Leaders: .Alan Purves and colleagues 

(50 participants) 



Suncfay, April 9, 1989 

6:00 p.m. ' 7:30 p.m. 
Conference Opening 

Speakers: 

Carolynn Reid-Wallace, Vice 

Chancellor, CUNY 
Gerrard Kelly, Director General 

Dawson College 
Rexford Brown, Education 

Commission of the States 

7:30-9:30 p.m. Reception 



Monday, April 10, 1989 

9:00 a.m. - 10:30 a.m. 
Opening Plenary Session 

Speakers: 

Joseph S. Murphy, Chancellor, The 
City University of New York 

John Dixon, Author of Growth Through 
EooliStl 

10:30 a.m. - 5:30 p.m. 
Concurrent Panels/Workshops 



Tuesday, April 11, 1989 

9:00 a.m. - 10:15 a.m. 
Second Plenary Session 

Speaker: 

Janet White, Deputy Director, 
NFER, England 

10:30 a.m. - 11:45 a.m. 
Concurrent Panels/Workshops 

12:00 p.m. - 2:00 p m. 
Luncheon and Closing Session 

Speaker: 

Bernard Shapiro, Deputy Ministf^r 
of Education, Ontario 

2:15 - 5:00 p.m. 

Concurrent Panels and Workshops 



ERIC 



REGISTKAT16N INI^^RMAlldN 



HOTEL RESERVaYi6NS 



Registration is limited to 700 people. Before March 
9, the registration fee of $100 US or $125 
Canadian per participant includes: 



• opening night reception (April 9) 

• 2 continental breakfasts (April 10,11) 

• conference luncheon (April 11) 

• coffee and tea between sessions 

• all conference materials 



After March 9. 1989. the registration fee is $120 
US or $150 Canadian. 



For further information, call Linda Shohet 
514-931-8731 or Karen Greenberg 516-766-8099 



Please write or call the Centre Sheraton Hotel 
directly before March 7, 1989 to receive our 
special conference rates: 

Single: $115 Canadian* 
Double: $125 Canadian* 

'There is qa lax on hotels in Montreal. The United 
Stales/Canadian exchange rate varies: the cost of the room in 
US dollars wiN be 20% - 25% less than the Canadian rates above. 

Address: Le Centre Sheraton Hotel 

1201 Dorchester Blvd. West 
Montreal, Quebec 
Canada H3B 2L7 

Phone: (514) 878-2063 



THE SEVENTH ANNUAL CONFERENCE ON WRITING ASSESSMENT 



Registration Form 

NAME 

(last name) (first name) 
TITLE 

INSTITUTION 

MAILING ADDRESS 



HOME PHONE ( ) WORK PHONE ( ) 

Conference Registration Fee: Before March 9, 1989: $100 US; $125 Canadian 

After March 9. 1989: $120 US; $150 Canadian 

Workshop Fee: If you are registering for a pre- or post-conference workshop, please write a separate check 
for each workshop (so that we may return your check if the workshop has already reached its limit). Each 
workshop is $40 US. $50 Canadian. Please circle the letter of each workshop for which you are registering: 

Sat. April 8: A_ B_ Wed. April 12: C_ D_ 

Please mnke all checks payable to: National Testing Network 1989 

Please mail completed registration form and check(s) to: 
Linda Shohet 

3040 Sherbrooke Street W. 
Montreal. Quebec. Canada H3Z 1A4 



ERLC 



of the student writing by providing a holistic view of 
writing. 

Before handing out samples of student writing, 
Fitzgerald discussed the theoretical background for 
developing the rhetorical criteria. She also reviewed some 
of the common problems of holistic scoring, emphasizing 
the fact that holistic scoring docs not consider the different 
purposes of writing (for example, persuasive vs . self 
expressive writing). The rhetorical evaluation criteria 
developed at the University of Utah were designed to 
alleviate some of the problems encountered when using 
holistic scoring. The criteria help readers consider the 
purpose of students' writing and identify the internal and 
external purposes of the writing situation. 

Fitzgerald pointed out that students' internal and 
external purposes complicate the writing situation for 
them. At the UnivCTsity of Utah, faculty feel that the 
purpose for students' writing needs to come from the 
students (i.e., internal), but in academia the purpose often 
comes from the instructor and is motivated by grades 
(external). The student has to think up his or her purpose 
for writing and must shape this purpose to serve the 
academic external purpose. Therefore, the student's 
purpose is always dual. These internal and external 
purposes are in essence the rhetorical situation and they 
need to be taken into account when faculty evaluate 
writing, particularly when this evaluation is used to place 
freshmen into English courses. Students' ability to handle 
this complex rhetorical situation informs instructors of 
the students' readiness for college writing. 

Next Fitzgerald described how the rhetorical 
expectations of University of Utah professors were 
determined and used to develop the rhetorical evaluation 
criteria. These criteria consist of the following categories: 

Category 1: The wrUers relationship to college 
readers and writers. Expectations : The most 
proficient writers recognize that any single piece of 
college writing is part of an ongoing written 
discussion about a topic and that they are expected to 
make a contribution to the discussic^i. They 
recognize that an authority (i.e, professor, test giver) 
identifies issues for discussion. 

Category 2: The writer's relationship with his or 
her subject matter. Expectations : College writers 
control their subject matter, pressing it into service to 
support their internal and extenial purposes. 

Category 3: The writer's relationship to the 
conventions of the genre. Expectations : College 
writers employ syntactical units appropriate to their 
thought, precise vocabulary, and the mechanics and 
spelling of standard written American English. 

University of Utah students arc given placement essay 



directions that explain the external rhetorical situation; and 
they have 45 minutes to plan, write, and revise their 
essays. 

After reviewing the theoretical background and the 
criteria, participants used these criteria to evaluate and 
discuss some student writing. Fitzgerald pointed out that 
readers are told to pay attention to content and 
reasonability, that there are no hard and fast rules, and that 
judgment is a balancing act of various criteria and 
expectati ns of each institution. Readers at the University 
of Utah look at the quantity of student writing as relative 
to every piece of writing. In summary, Fitzgerald stated 
that these rhetorical evaluation criteria force readers to 
evaluate writing for its purpose, help readers define good 
college writing, and address the need to teach students 
about the effect that the rhetorical situation has on their 
writing. 



USING VIDEO IN TRAINING READERS OF 

ASSESSMENT ESSAYS 

Speaker: George Coope'', University of Michigan 

Introducer/ 

Recorder: Terence Collins, University of 
Minnesota 

Larj'e scale testing programs face a recurring 
problem of reader consistency and reliability. In this 
presentation, George Cooper demonstrated how the 
English Composition Board at the University of Michigan 
uses a video presentation of reader "standardization 
sessions" for self-monitoring within the reader cadre, for 
training new readers, and for diss'^minating information 
about the ECB's procedures to variouj campus 
constituencies. While Cooper presented alone, his 
remarks were prepared with Liz Hamps-Lyon. 

In its placement readings, members of Michigan's 
ECB teams are guided by statements of criteria clustered 
under three headings: "sUiicture of the whole essay," 
"smaller rhetorical and linguistic units," and "conventions 
of standard English surface features." Students write 
essays in response to prompts that define a situation and 
provide several choices of opening sentences. Two 
important characteristics of the 6000 student essays, then, 
are that topic choice is limited and orientation toward the 
topic is guided through provision of choices for essay 
openings. Further, the essays are rated for placement: 
recommendations fall into one of the following categories: 
exempt (7%), Introductory Composition (82%), and 
tutorid (11%). These recommendations reflect scores of 
1,2-3, and 4. While criteria for quality are outlined to 
rc?.ders, no specific calibration of trait content for the four 
point range ? provided. 



20 2u 



NOTES from the 
NATIONAL TESTING NETWORK IN WRITING 



The City University of New York 
and the Fund for the Improvement of 
Postsecondary Education 



Directors, NTNW 
Karen Greenberg 
Harvey Wiener 
Richard Donovan 



Editors, NOTES 
Karen Greenberg 
Virginia Slaughter 



Published by the Instructional Resource Center, CUNY Office of Academic Affairs 
Harvey Wiener, Director 



Scor ing in this system dq)ends on achieving what 
Cooper calls a "community of values'* among readers. 
The video of reader standardization sessions grew out of 
one summer's experience in which this community of 
values has been lost as Cooper put it, "readers w^ using 
an unimaginable range of criteria by which to evaluate 
essays'* and "had become entrenched in their own 
perspectives.** The original motive for the video was self- 
examination. Through videotaping daily standardization 
sessions in which papers receiving '*split'* scores were the 
focus of discussion, Ojoper's team of readers sought to 
capture the articulaticii of values giving rise to the 
discrepancies and to rpcord the process of moving to 
agreement on appliaition of criteria. This led the team to 
analyze and commui/icate impoitant chaTacteristics of their 
standardization sess bus and our assessment as a whole. 
Also* this pnx:eduie modeled a process of '*give-and-take" 
that was helpful in tlaining new readers and in explaining 
the placement procels to various departments. 



From ten hour, 
thirty five minutes o 
explanation and high 
discussion presented 
''positive sharing' 



of session tapes, the team assembled 
[actual exchanges interspersed with 
ighting. The standardization 
in the tape enacts what Cooper calls 
t^Hk marked by the various readers' 



attempts to recognize the qualities in an essay that lead to 
divergent scoring, each reader's comments leading to 
further discussion and finally to agreement Such 
discussion (whether on the tape or in person at the start of 
a reading session) reminds participants of the criteria 
governing scoring. It serves the further purpose of 
helping group members realize the vitality of the act of 
reading, placing an apparently perfunctory reading act (in 
the context of reader-response theory) into the full context 



of extra-textual factors that sltiapc readings in open view. 
The importance of reflecting on the evaluator as reader-co- 
creator of a text-rests in the capacity of texts to sway a 
read^-evaluator when they embody positions to which the 
reader might be favorably inclined or which the reader 
might find repugnant 

Cooper asserted that the taped standardization 
sessions play the key role of "forming individual 
consciousness into a community consciousness." The 
video record of this work in progress puts flesh on the 
abstraction and models the process for beginners in order 
to cultivate a community of readers who will evaluate not 
only the student essays, but who will also study their own 
responses, keeping in mind the relationship of their 
responses to the criteria. 



WPA PRESENTATION ON EVALUATING 
WRITING PROGRAMS 



Speakers: 



Introducer/ 
Recorder: 



Robert Christopher ^ Ramapo College, 
New Jersey 

Donald Daiker, Miami University, Ohio 
Edward White, California State 
University, San Bernardino 

John Schwiebert, University of 
Minnesota 



This session was organized by the National Council of 
Writing Program Administrators (WPA), and the panelists 



ERLC 



wished to share their experiences as writing program 
evaluators and to address salient issues of writing 
assessment as they pertain to writing program evaluation. 

Upon request, consultant-evaluators from the 
National Council of Writing Program AdministratcM^ will 
conduct a writing program assessment for a college or 
university. To prepare both themselves and the WPA 
evaluaUM^ (usually a team of two) for the assessment, 
schools an asked to comfriete a narrative "self-study" of 
their writing program at least one month before the WPA 
team visits. Robert Christopher distributed copies of the 
self-study guidelines, which can be obtained from the 
address given at the end of this abstract. The purpose of 
the assessment is to help faculty and administrators 
develqp more effective writing programs appropriate to 
their institutions' needs. Donald Daiker and Edward White 
described occasions when the WPA service assisted 
writing faculty on a campus to enlist high-level 
administrative support for innovative reforms in their 
writing programs. 

Most of the fjcssion focused on the topic of testing, 
which, it was emphasized, is only one dimension of an 
overall program assessment To be effective, institution- 
wide programs of assessment should be appropriate to the 
particular needs, demogr^hics, and aims of the individual 
school. The challenge of deciding what is appropriate 
underscores the relevance and value both of the WPA 
assessment and of the self-study a school does before the 
WPA visit. Panel members discussed some of the key 
issues involved in each of the following kinds of testing: 
admissions, placement, equivalency, and course exit. 
Risirig junior and value-added tests were also mentioned 
but could not be discussed in detail in the time allotted. 
Key points about each type of test are below: 

Admissions Tests: Discussing the purposes of the 
SAT vCTbal exam. White ...messed that the SAT assesses 
veri>al aptitude and not writing ability. As such, it is 
useful as a criterion for admissions but should not be a 
basis for exempting students from freshman composition. 

Placement Tests: Before actually developing a 
placement test, a school should decide if it n eeds one. 
Many institutions do need such exams to assure that 
individual students receive writing instruction ^propriate 
to their abilities and experience. After a need has been 
determined, a school should develop a test based upon ii^ 

curriculum-specifically, upon what is taught in 
freshman composition. Some schools borrow or adopt 
tests that fail to mesh with their own institutional needs. 
Only by examining its curriculum can an institution 
rationally decide what it is testing for. 

Equivalency Tests : These tests provide a special 
service to students, and they differ fundamentally from 

er|c 



placement exams. The basic message of an equivalency 
is: "Show us that you (i.e., the student) are in control of 
what we do in freshman comp and well let you out of it" 
As such, equivalency tests must be based fumly on the 
school's curriculum. Given its special purpose, the 
testing instrument must also be more complex than one 
used for placement 

Coiifse Exit Tests : The course exit exam is a 
common test that all students must pass in order to 
complete a course (freshman composition or other). 
Noting that such tests can discriminate against students 
who write well but who are poor drafters or test takers. 
White urged against tests being the only basis for exit A 
good exit exam covers materials and processes which 
students have addressed in their class. White observed that 
the greatest potential benefit of an exit test derives less 
from the test i'«self than from the incentive it can provide 
for depanmental and interdepartmental faculty discussions 
of writing and curriculum. 

Institutions desiring more information on the WPA 
consultant-evaluator service should write to Professor Tori 
Haring-Smith at the following address: 

Rose Writing Fellows Program, Box 1962, 
Brown University, Providence, RI 02912. 

DEVELOPING AND EVALUATING A WRITING 
ASSESSMENT PROGRAM 

Speakers: Lorenz Boehm, Oakton Community 
College. Illinois, 
Mary Ann McKeever Oakton 
Community College, Illinois 

Introducer/ 

Recorder: Marion Larson, Bethel College, 
Minnesota 

Lorenz Boehm and Mary Ann McKeever addressed 
issues of designing, implementing, and evaluating an 
essay test currently being used by three Chicago-ar^ 
community colleges. This test is designed both to place 
students in appropriate composition courses and to 
determine if students in developmental or ESL 
composition courses are prepared to move on to Freshman 
Composition. 

Altliough the test has been used since 1984, 
preparations for its implementafion began in 1982, and 
evaluation and refinement of tesi questions and procedures 
is ongoing. This test replaced an objective test of 
grammar and usage that was being used at the time. 
During the planning process, pro'npts were dcwdopcd and 
pilot-tested, evaluation criteria were discussed, and reader 
training methods were developed. In addition, diose 
developing the test sought to gain campus-wide support 

22 



and involvement from faculty, staff, and administration. 

In the test, students are given two argumentative 
topics from which to choose. With each topic, they are 
give a context for writing and an audience for whom they 
are told to formulate an essay arguing their position. 
They are given SO minutes to plan and write their essay. 
Efforts are made to be fair to ESL students: topics are as 
"culture free** as possible, prompts are worded simply, 
ESL (and Learning Disabled) students are given an 
additional 20 minutes to write their essay, and specially- 
trained leaders evaluate ESL and LD responses. 

Each essay is holistically scored on a 6-point scale 
by three readers, two of whom must agree in their 
assessment In cases of disagreement, an additional reader 
may be used, and an appeals procedure is available to 
students. These readers come from across the college and 
all of them participate in frequent, extensive training to 
be sure that the understand and agree upon crit^a for the 
essays they Vvd! be asked to evaluate. In training, as well 
as before actual evaluation sessions, agreement among 
readers is reached by examining, rating, and then 
discussing sample essays; discussing criteria for scoring; 
and then rating more sample essays. 

Many benefits have come from Oaicton's use of this 
writing placement test. Primary among them is the 
greatly increased dialogue among faculty, administrators, 
staff, local high schools, parents, and students about 
writing. Such cooperation is essential to the test's 
success, because it has helped short-circuit potential 
disagreement and has made members of the college 
community more receptive to what the composition 
faculty are trying to accomplish. It has also greatly fueled 
writing acro>s the curriculum efforts on campus. 

This test is continually being evaluated by Boehm, 
McKeever, and their colleagues to ensure that it is placing 
students appropriately, that the different prompts are 
eliciting responses of comparable quality, and that 
agreement among readers is high. The results thus far are 
quite positive: comp3sition teachers are very satisfied that 
students are being placed in the courses they need. Pilot 
testing prompts in composition classes and then carefully 
monitoring the ratings given to essays written in response 
to these prompts has helped ensure that different versions 
of the test are comparable; and evaluation criteria are kept 
consistent by frequent, ongoing training of essay raters. 



the changing task: tracking growth 
Over time 

Speaker: Catharine Lucas, San Francisco State 
University 

Introducer/ 

Recorder: Hildy Miller, University of Minnesota 

Catharine Lucas explained that traditional writing 
assessment is designed to determine whedier student 
writing improves on a given specified task, whereas what 
we need is a new kind of assessment that focuses on how 
students change the task as they grow as writers. She 
noted that we know that as writers develop, they formulate 
new structures to represent tasks, and that they may be 
awkward in their initial attempts at working with new 
structures. For example, writers may experiment with 
complex argumentative structures, abandoning the simpler 
narrative structures at which they may be more skilled 
Ideally, writing assessment should recognize and reward 
their attempts at more sophisticated formulations, even 
when performance falls short, rath^ than constraining the 
writing task in a way that only measures their ability at 
what Moflfett calls "crafting to given forms." 

To debunk the myth that writing is a unitary 
measurable construct and to show instead the impact of a 
student's maturing task rq)resentation, she provided 
samples of one student's writing that were submitted in 
response to a longitudinal portfolio assessment of his 
writing abilities from ninth to twelfth grade. During each 
of the four years, the student was asked to produce an 
essay as part G[ a school-wide assessment program. Four 
readers then rank-oidered the four papers to determine the 
writer's best and weakest woric. While we would assume 
that his ninth grade essay would be weakest and the 
twelfth grade version the best, instead a diffnient pattern 
emerged: raters consistently rated the twelfth grade effort 
the worst 

The reason for this surprising result was found 
through closer inspection of the writer's choices in task 
representation. In the three papers he submitted in grades 
9, 10, and 1 1 , the writer used the narrative form, a 
structure that develops comparatively early, since 6th 
graders are typically sophisticated story tellers. These 
essays were successful, in part, because he was using a 
familiar form. However, in the 12th grade essay he chose 
to represent the task with an argumentative form, usually 
a later developing skill, and one in which he was as yet 
inexperienced 



Thus, Lucas concluded, we need a way to take a 
writer's growth into account in assessment Writers 
expenmenting with new structures face a hard^ task, one 
which is likely to cause the writer initially to produce new 
errors. Evaluators of writing, like judges of figure 



uc 



0„ P3 23 



skating, divii)'* and other "performance sports," need to 
develop systt natic ways of taking into account the 
difHculty level of what the performer is attempting. In 
order to account for changes in what is attempted we need 
to study how writers develop both across and within 
discoiurse domains. This will require a common language 
for identifying domains and a way of charting what carries 
over and what changes when writers move form one to 
another. All discourse theorists polarize fictional and non- 
Hctional writing, or as Briuon terms it, poetic and 
transactional writing. As a result, we tend to assume that 
the two are mutually exclusive: fiction writers rarely 
include essays in fiction and in academia we rarely allow 
poetic expression. In addition to these polar ends of the 
discourse continuum, Lucas posits a middle category, 
which draws freely on both fictional and academic styles, 
and includes autobiography, belles letters , the New 
Journalism, and the personal reflection essay widely used 
in classrooms and school assessments. While it is 
relatively easy to chart a writer's development within 
either the literary or the discursive domains, growth in 
this middle domain is sometimes marked by shifts from 
fictional techniques to extended abstract discourse, as in 
the case presented. Whether students are moving within 
the mixed domain, or from the literary end of the spectrum 
to the discursive end, even when teachers recognize the 
second piece as representing a later effort, they recognize 
that the text is often less successful in what it attempts 
than the earlier piece. This difference diminishes, of 
course, as the student gains skill in handling discursive, 
transactional writing. 

To make possible more careful comparisons of what 
changes as students move within and across domains, 
I ucas has developed a method of defining tasks that draws 
on work done by Freedman and Pringle ("Why Children 
Can't Write Arguments *) based on Vygotsky's (Thought 
an d Lan^ua ge^ distinctions between focal, associational 
and hierarchical arrangements, as well as on Coe's (Toward 
a Grammar of Passage*;^ method of charting relations 
between propositions in a text. Lucas's system 
distinguishes between four text patterns: (1) the 
chronological core in which the sttident tells a story, 
providing commentary at cnd-a sign the writer is moving 
toward abstraction; (2) the focal core in which the title 
provides the subject of focus, with each sentence relating 
to it-a sign that some notion of related ideas is emerging; 
(3) the associational core in which we see chains of 
associations forming, often with a closing commentary; 
and (4) the hierarchical core, in which long-distance 
logical ties supplement short-range connections between 
complexly interrelated ideas, in a pattern typical of 
advanced exposition in Western cultures. Using this 
system, we may begin to see how writers build new 
schema within these different domains, and begin to 
reward them for these promising signs of growth in our 
assessments of their writing abilities. 



ERIC 



ASSESSING WRITING TO TEACH WRITING 

L - ij k€i' : Vic/d Spandel, Noniiwest Regional 

Education Laboratory 

Introducer/ 

Recorder: Alice Moorhead, Hamline University 

Rarely are the lessons learned from large-scale 
writing assessment translated into terms that make them 
relevant for and useful to the classroom teacher. Yet 
many of those lessons show how teacl ^rs can use 
systematic writing assessment-especially when teaching 
writing as a process. Large scale, district-wide writing 
assessment is a costly process (at least 2.5 days for 
training/assessing and between $2-$8 a writing sample); 
however, as part of professional development programs, 
mo>t districts could justify the necessary time and budget 

In this presentation, Vicki Spandel discussed her 
efforts, along with those of Richard Stiggins', to Unk 
writing assessment and insttuction through their work in 
the Portland area for Northwest Regional Education 
Laboratory. Spandel's current assessment method focuses 
on using an analytic rating guide. She argues that 
although it is difficult to ? 'parate form from content in 
assessment, one can ass. .d the features of writing, thus 
her interest in an analytic guide that can be used 
holistically to assess and to teach writing. Since teachers 
are often afraid of assessment, using the rating guide can 
ensure that what teachers value gets assessed and then gets 
translated into practice. 

As an assessment tool, Spandel's analytic rating 
guide was generated from writing samples rather than 
developed as a guide to impose upon writing. The guide 
captures a more complete profile of the writing samples 
when used along with holistic assessment. It 
distinguishes six features of writing: ideas and content; 
organization; voice, word choice; sentence sttiicture; 
writing conventions. Each feauire is described and ranked 
by degrees for a score of 5 or 3 or L Not only does this 
analytic rating guide objectify expectations for writing but 
it also offers a more defensible version of the subjective 
process of writing assessment. 

Using this guide with the holistic assessment 
process, particularly as in-service workshop for 
professional development, has two key advantages: 

(1) The assessment process promotes "real" 
agreement among teachers and professional raters 
about strengths and weaknesses in writing. 

(2) Teachers can re-enter the classroom to teach 
writing more explicitly on what "counts'* in 
writing and know this instruction is in concert 
with and reinforced by others. 



24 




Not only can teachers use the analytic guide but so can 
students. In peer review groups, students can focus their 
writing efforts marc directly widi die six feature guide as 
"revision stations** for students to visit for specific 
feedback on tfieir writing. In SpandeFs experience, 
teachers v^lcome die use of diis analytic guide for 
assessmentand for leaching writing. Many teachers 
claim: "111 never teach or diink or writing in quite die 
same way." 



READER-RESPONSE CRITICISM AS A MODEL 
FOR HOLISTIC EVALUATION 

Speaker: Karl Schnapp, Miami University 
Introducer/ 

Recorder: Ann Hill Duin, University of 
Minnesota 

Karl Schmq)p*s session focused on die plication of 
leader-response dHMxy to large and small scale holistic 
assessment Schnappbeganby citing die wcMrk of Stanley 
Fish, David Bleich, and N(Hman Holland as working 
models for the holistic evaluation ci student writing. He 
tfien said dut his own work is also ba^ on Edward 
White's theories of composition as a socializing and 
individualizing discipline. From diese dieorisLs Schnapp 
ccmcluded diat die best composition pedagogy views 
students* writing fiom bodi social and r ^vidual 
perspectives. In short, die interpretatioii and evaluation of 
writing dqpends on qualities of die community in which 
the writing was created and was evaluated. 

Schnai^ dien described his specific project His 
model is based on dvee reading dieories diat lead to a 
model for die holistic evaluation of writing. The first 
dieory is die "t(^-down" model of reading as discussed by 
Holland and Bleich, die second is die "text-reader 
interaction" dieory (born infomiation-ptocessing dieory) 
as discussed by Rosenblatt, and die diird is die 
"communal association" dieory as discussed by Fish. 
Schnapp described his model in detail. Then he asked 
conferees to fill out a survey identical to diat used in his 
study. The survey asked us to complete questions 
regarding our peicq)tions and under^ding of 
composition/language arts. Next we read an essay written 
by a freshman student and rated die student essay. 
Finally, we completed a second survey in whkh we gave 
information on the criteria we emptoy when holistically 
evaluating student writing. As widi Schnapp's results, we 
had about 75% agreement in terms of die common goals 
of die composition instructors present Schnapp stated 
diat his research shows diat writing teachers see writing as 
helping students on more of a practical level dian on an 
aesdietic level. 

The rei;iaind^ of die presentation was a discussion 
between Schnapp a.id die conferees. Key points diat 

er|c 



emerged included: die need to ask readers about what 
influences diem as diey evaluate papers; die need to 
determine die evaluative standards for one's discourse 
conrununity; and die extent to which readers are influenced 
by what diey are diinking about while evaluating writimg. 



THE DISCOURSE OF SELF- ASSESSMENT: 
ANALYZING METAPHORICAL STORIES 

Speakers: Barbara Tamlinson, University of 
California, San Diego 
Peter Mortensen^ University of 
California, San Diego 

Introducer/ 

Recorder: Anne O'Meara, University of Minnesota 

BartMua Tomlinson and Beter Modensen gave 
conferees attending this session an opportunity to become 
students of their own writing processes. Much of die 
sessKMi was devoted to composing, sharing, and analyzing 
our own metaphorical stcvies about how we write. 
Tomlinson and Monensen feel that usiiv metaphorical 
stories in die classroom provkles a means for students to 
take reqxMisibility for thisir own writing, to balance 
personal widi external assessment, and to center attention 
on die writing process rather than die product 

Tomlinson began by sharing some of her own 
metq)h(^ for writing as well as some of diose she foimd 
in her study of over 2000 professional writers. Handouts 
gave furdier examples iiom bodi professional and student 
writers. The metaphors were sometimes relevant to fix- 
die process of writing as a whole and sometimes symbols 
focusing on one aq)ect of writing. They ranged fiom clear 
analogies (e.g. building, giving birdi, cocddng, mining, 
gardening, hunting, getting die last bit of tcodipaste) to 
metaphors diat needed elabcxation like a "gusset" (a small, 
irregular piece of material necessary for die construction of 
a garment, but hidden) and die **k>st wax process" (a way 
of making a mold which dien melts away when die 
product is finished). Tomlinson stressed that metaphors 
can reassure and guide her through composing problems as 
well as help her describe diese problems. 

The speakers dien simulated dieir technique for using 
metaphoriod stories in die classroom. As the participants 
began to compose their own metaphorical stories, Peter 
Monensen asked some guiding questions to get us started, 
encouraging us to diink of metaphors we might use for 
beginning writing, finishing writing, writing under 
pressure, writing badly, writing well, generating ideas, and 
so on. He suggested students could also die guiding 
questions (distributed on die handout^ in interviews or in 
collaboration to get started. 

In die discussion diat followed, Tomlinson and 
Monensen stressed that metaphors should be accq)ted and 
explored, radierdian judged They may be original, 
adopted, or enforced; diey may be idiosyncratic, 
contradictory, or even strike us as "bad." The important 

25po 



thing is that we and our students look at what the effects 
of writing nietaphors are, what they imply about writing, 
and how they niatch or might amplify o^ir experience. 
When they have students compare their metaphors to 
those of professional writers, Tomlinson and Mortensen 
minimize possible intimidation by emphasizing that the 
purpose is to And similarities and common problems. 

Finally, the speakers summarized their reasons for 
using metaphorical stories in the classroom. In addition 
to taking authorit>* for their own writing and balancing 
personal with external assessment, students also need to 
develop better self-monitoring processes because many do 
not have a language for thinking about their processes. 
(Tomlinson's survey of 23 secondary and college writing 
texts showed that there was very little flgurative language 
in these texts). The speakers have found that by 
comparing metaph(mcal stmes, students can gain 
confidence and learn that other writers (including 
professionals) may encounter similar problems. Students 
begin to talk like writers and develop a stronger interest in 
writing. 



THE USES OF COMPUTERS IN THE ANALYSIS 
AND ASSESSMENT OF WRITING 

Speakers: William Wresc''. University of 
Wisconsin-Stevens Point 
Helen Schwartz, Camegie-Mellon 
University 

Introducer/ 

Reporter: Marie Jean Lederman, NTNWand 
Baruch College, CUNY 

William Wresch discussed the current state of the 
field of computer analysis of sttident writing, dividing the 
software programs into six different categories, each of 
which has a different pedagogical orientatioa The first 
category is error checkers > These programs focus on 
homonym confusions, sexist language, usage errors, and 
infp'icitous phrases. Some examples are Writer's Helper 
(Conduit), Sensible Gramma; (Sensible Software), 
RightWriter (RightSoft), Ghost Writer (MECC), and 
Writer's Workbench (AT&T). 

The second category is refcwmatters which, rather 
than find errors, make it easier for writers to Hnd their 
own errors. One of the fu*st programs was Quill (DC 
Heath) which included a combination of prewriting, 
writing, and revising activities. For example, to help 
students revise their work, it displayed each sentence of 
their paper alone on the screen. Rather than make 
statements about or changes in the sentence, the program 
allowed students to look at each sentence in a new way. 
Other newer rcformatters include Ghost Writer (MECC) 
and Writer's Helper (Conduit). The third category of 
programs is audience awareness programs . These 
programs include readability formulas and they pinpoint 

er|c 



vague references and other problems. 

The fourth category is student conference utilities . 
These computer programs Ury to help students develc^ 
editing skills as they read each other's papers and "send" 
comments to each other. Two examples are Quill and 
Alaska Writer (Yukon-Koyukuk School District). The 
nfth category is grading utilities , programs designed to 
help teachers in the clerical aspects of paper grading. 
Students turn in their work on disks, and the teicher uses 
the computer to l.'^.lp grade the woric. By creating ten or 
twelve messages for major eiTors, teachers can respond 
with just a keystroke or two to most of the mistsd^es they 
are likely to see. Examples are the RS VP project 
(Miami-Dade Community Coilege) and Wr*cr*s Network 
(Ideal Learning). 

The last category is automatic graders . This is the 
logical "next step" after grading utilities. Ellis Page of 
the University of Wisconsin proved twenty years 2^0 that 
a computer could grade papers quite well based on a 
formula of pap^:r length, sentence length, level of 
suDordination, and word length. However, merely 
assigning a grade isn't enough in a classroom situation in 
which students expect not only a grade but a iange of 
responses from teachers. It might be possible, however, 
to use such computer graders in large-scale assessment 
progranfis. Wresch concluded that there are many decisions 
to be made about how computers will be used in writing 
analysis, but it is certain that there are already many 
opportunities and, surely, many more to come. 

Hvlen Schwartz began by discussing several 
purposes of assessment: diagnosis and revision as well as 
improved self-evaluation. The range of writing behaviors 
which can be assessed are ideas, organization, rhetorical 
presentation (purpose and audience assessment) and 
grammatical correctness. In answer to the question, "How 
can computer programs assess diese behaviors for these 
puiposes?" she first gave a short answer, "No computer 
program alone is now accurate or helpful enough" and 
most of the existing programs may overwhelm the student 
with too much information at once. Style checkers can 
drav«^ attention to problems, but the student must make the 
decisions. And sometimes readability fdmulas can lead 
students to vary sentence length by creating run^n 
sentences and fragments. Schwartz pointed out that 
"Computer programs are useful as delivery systems for 
teacher, peer and self-assessment They help students 
become aware of problems in their writing and help them 
to solve these problems." She gave four examples: 

1) Prewriting programs such as "ORGANIZE" (Helen 
Schwartz, Wadsworth Publishing) can be used not 
only to help students s^e the shape of their papers but 
also to desensitize peer review. 

26 Pti 



2) Templates, such as the sclf-ev Juation form given in 
"Interactive Writing/ help students assess strengths 
and weaknesses. 

3) "SEEN" (Schwartz, Ccnduit) ir eludes a built-in 
bulletin board whe^e peer review can take place. 

4) Programs for teacher and peer response to paper drafts, 
including (a) "Ch'** and Comments," developed by 
Christine Neuwirui at Carnegie Mellon which 
facilitates discusSiOn and peer review; (b) "PROSE" 
(Prompted Revision of Student Essays by Davis, 
Kaplan, Martin, McGraw Hill) which allows 
summary comments; comments embedded in the 
paper, revision notes; and handbook-like responses 
with an overview of the error, further explanation, and 
then interactive tutorials on each of 18 features; and 
(c) "Prentice Hall College Writer" which is a word 
processor that allows access l) an on-liiie handbook 
and allows the insertion of comments that can include 
excerpts from the on-line handbook. 

The discussion tliat followed centered on examples of 
software described and demonstrated by the speakers. 



LEGAL RAMinCATIONS OF WRITING 
ASSESSMENT 

Speaker: William Lutz, Rutgers University, 

Camden 

Introducer/ 

Recorder: Chris i4nJon, University of Minnesota 

William Lutz, who holds a law degree and is a 
member of the Pennsylvania Bar, addressed the importance 
of considering the legal constraints under which testing 
must operated. Lutz began by distinguishing the different 
kinds of tesung programs: those conducted within an 
institution and those conducted outside the restitution. 
External testing programs, such as those conducted by a 
school district or by a state agency, are govemcd by a 
.iCrics of laws and court decisions. Internal testing 
programs, such as course placement and proficiency 
testing, come under fewer legal constraints and exist, at 
present, in a legal nether world. However, there is enough 
legal precedent to warrant caution by anyone involved in 
any testing program. 

According to Lutz, testing programs may be attacked 
from a variety of legal approaches. Title VI of the Civil 
Rights Act prohibits any practice that would have the 
effect of restricting an individual, on the grounds of race, 
color, or national origin, "in the enjoyment of any 
advantage or privilege enjoyed by others receiving any 
service, financial aid, or other benefit." It is important to 
note that this law would judge a testing program by its 

ERIC 



effect, not its purpose. Moreover, the burden of proof in 
any legal action would fall on those conducting the test. 
Thus, under this law, testing programs with 
disproportionate effects on minority students are subject to 
close judicial scrutiny. If a state has a law guaranteeing 
an education to all its citizens, then all citizens have a 
property interest in an education. A testing program in 
that state can be attacked as a denial of a pioperty right 
without due process. Such attacks have succeeded. 

Lutz pointed out that a testing program can be 
attacked as a denial of a liberty interest Due process 
guarantef;s a ri^ht to liberty, and this liberty interest is 
infringed where a stigma attaches to the student as a result 
of the test The 14th Amendment to the Constitution 
states that "No person shall . . . deny to any person 
within its jurisdiction the equal protection of the laws." 
While state laws may treat differently for various purposes 
by classification persons who are similarly situated with 
respect to the purpose of the law," they must be accorded 
equal treatment hearing cases brought under this 
Amendment, the court will ask two questions: (1) has the 
state acted with an unconstitutional purpose? (2) has the 
state classified together all and only those persons who are 
similarly situated? For example, if someone wanted to 
attack a placement test there are two possible arguments 
under the 14th Amendment which might be used. First, 
the test itself can be attacked by arguing that while testing 
may be a legitimate means of classification, thi^ particular 
test is so inadequate that one cannot possibly tell whether 
a particular student is ready for or has the ability to do 
college level work. A second approach is to attack the 
tests results by arguing that while the means used to 
classify a student may be legitimate, these means are so 
imprecise that one cannot possibly tell whether the 
student has been classified correctly. 

There are some vague areas here, or the legal nether 
world as Lutz calls it Before the due process requirements 
of the 14th Amendment can 2q)ply to a cause of action, 
two questions mu;>t be answered: (1) do the concepts of 
liberty or property encompass the asserted interest? and 
(2) if due process does apply, what formal procedures does 
due process require to protect the interest adequately? in 
other words, an individual must have a legitimate claim of 
interest before due process can apply. Thus far, a college 
education has not yet been found to be a benefit for which 
someone can assert a claim of entitlement However, a 
claim of liberty could anply because testing may affect an 
individual's opportunity to choose his or her own 
employment. This issue is still open for litigation. 

Based upon a review of federal court decisions, Lutz 
offered the following Guidelines for Testing: 

1 . The purpose of the test must be clearly 
delineated. The test must be matched with 

27 




specific skills and/or specific curriculum 
objectives. 

2. Mere correlation between the test and the 
curriculum is not sufficient There must be 
evidence, obtained from a regular process, that 
classroom activities are related to curriculum 
goals and test specifications. 

3. All test items must be caiefully developed and 
evaluated to ensure that they conform to 
curriculum and instructional practices. 
Moreover, Uierc must be evidence that any bias 
related to racial, ethnic, or national origin 
minority status has been eliminated. 

4. If possible, other measures of performance and 
ability should be used in conjunction with test 
results. 



5. Cut-off scwcs should be the result of a well- 
documented process of deliberation that conforms 
to state and federal statutory requirements. Thcrc 
should be no suggestions of arbitrariness or 
capriciousness in seuing cut off scores. 

6. Students should be informed well in advance of 
what it is they need to know to perform well on 
the test. Students should also be informed in 
advance as to the nature of the test 

7. Options should be available for those students 
who fail \hf test These should include, at the 
very least, tlie cation to re-take the test, and 
instiUjtional help to prepare and/or correct 
deficiencies. 



(1) a formal process for administering and 
conducting the testing program, including 
full documentation; 

(2) a formal review of the program conducted at 
regular intervals by an outside, impartial, 
objective reviewer, 

Lutz concluded by saying that we live in a litigious age, 
and prudence suggests that those involved in testing be 
professional and institute the guidelines and lake the steps 
he outlined in his talk. 



SOME NOT SO RANDOM THOUGHTS ON THE 
ASSESSMENT OF WRITING 

Speaker: Alan C. Purves, The State University of 
New York, Albany 

As I near the end of a seven-year long comparative 
study of student performance in Written Composition 
sponsored by the International Association for the 
Evaluation of Educational Achievement, I should like to 
set forth some conclusions I have reached about writing 
assessment. 

1 . Written Composition is an ill-defined domain. 
There have been a few recent efforts at mjpping 
the domain through an examination of writing 
tasks and through an examination of perceived 
criteria, bat in general these have been ignored in 
most assessments of student performance. Most 
assessments tend to rely on a single assignment 
selected at random. 



8. Students should have access to their test scores 
and a full explanation of those scores. 

Finally, Lutz suggested that anyone conducting a tcstmg 
progTL- ^hould do the following immediately: 

1. Conduct a full, impartial review of the testing 
program, and document this review. 

2. Examine all the documentation in the program, 
and write any necessary additional documentation. 

3. Correct all the deficiencic* identified in the 
program, and then document the process by 
which the deficiencies were identified and 
corrcclcd. 

4 . Institute two procedures as a permanent part of 
the testing progriim: 

ERIC 



2. Wntlen composiuon is a domain in which 
products are clearly the most important 
manifestation; the texts that students produce 
fonm the ba.sis for judgments concerning those 
students. Teachers and asses.sors know that and 
so do students. 

3 . These prou jcLs are culturally embedded, and 
written composition is a culturally embedded 
activity. The culture may be fairiy broad or it 
may be relatively narrow such as the cult ure of a 
Lee Odell or an Andrea Lunsford, but students 
inhabit and produce compositions that reflect 
those cultures. 

4. When a student writes something in a large scale 
assessment in the United Slates, what is usually 
wrillcn IS a first-draft on an unknown assignment 
that IS then ralc^l by a group of people who make 
a judgment as to its quality. The result is an 



?8 



index of "PDQ," Perceived Drafting Quality. 
Whether PDQ has any relation to writing 
performance or ability is unclear, although it is 
probably a fair index. 

S. Given the fact that what is assessed is PDQ, it is 
little wonder that students see writing 
performance as comprising adequacy of content, 
handwriting, spelling, grammar, and neatness. 
Such is the case of the reports of secondary 
school students as to the most important features 
of the textual products of a school culture. 



NATIONAL AND INTERNATIONAL WRITING 
ASSESSMENT: RESEARCH ISSUES 

Speakers: Alan C Purves, State University of 
New York at Albany 
Thomas Gorman, National Foundation 
for Educational Research, Great Britain 
Rainer Lehmann, Institute for 
Educational Research, Federal Republic 
of Germany 

Introducer/ 

Recorder: Wayne Fenner, University of Minnesota 

This session was tHe first of seveial sessions on 
research on international writing assessment. Alan Purves 
began with an overview of the background of the fouiteen- 
nation Written Composition Study. Begun in 1980, this 
project is the most recent undertaken by the Intemational 
Association for Educational Achievement (IE A). Previous 
studies have examined the teaching and testing of science, 
math, reading, foreign language, and civic education. 
Unlike earlier subjects, the domain of written composition 
is a cloudy one: it is both an act of communication and 
an act of cognitive processing. Researchers, then, had 
first to define this domain, both empirically and 
theoretically. After this phase of domain specification, 
researchers designed a series of specific writing tasks and 
writing purposes to be included in the study. Third, a 
five-point scoring scheme was devis'xi that would be valid 
and reliable across languages and cultures. Finally, raters 
were chosen and trained. 

Thomas Gorman discussed the results from a recent 
writing assessment program in England in order to clarify 
what can be learned from intemational studies and cannot 
be learned from separate, national writing assessment 
projects. The problem of domain specification seems to 
be culturally relative. The purpose of writing varies in its 
relation to general educational aims, and specific tasks 
may or may not reflect the kind of writing that is 
generally required of students in specific schools in 
particular cultures. There is, however, remarkable 
unanimity of assessment criteria and standards of 

ERLC 



performance across languages and cultures. Content, for 
example, as well as form, style, and tone appear to be 
rating factors utilized internationally. As a result of the 
lEA Study, we have learned VMxt about the relative 
difficulty of various writing tasks, and we have gathered a 
great deal of information about background variable 
relative to writing performance. These variables include 
students* interest and involvement in life at school, plans 
for future education, amount of daily and weekly 
homework, and involvement of parents in the educational 
process. 

Rainer Lehmann discussed the methodology of 
comparative writing assessment, specifically the 
application of multitrait-multimethod analysis to the 
problem of validating the analytical scoring scheme used 
by all countries in the lEA Study. Although his 
discussion was limited to results from the Hamburg data, 
Lehmann provided information from a non-English 
language context that appeared to confirm the lEA 
student's methods and findings. 

TEACHING STRATEGIES AND RATING 
CRITERIA: AN 

INTERNATIONAL PERSPECTIVE 

Speake rs: Sauli Takala, University of Jvaskyla, 
Finland 

/?. Elaine Degenhart, University of 
Jvaskyla, Finland 

Introducer/ 

Recorder: Robin Murie, University of Minnesota 

This session reported on data gathered in the lEA 
(Intemational Association for the Evaluation of 
Educational Achievement) study of Written Composition. 
The lEA study, now in its eighth year, is a lai^ge-scale 
examination of student writing in 14 countries (Chile, 
England, Finland, Hungary, Indonesia, Italy, the 
Netheriands, Nigeria, New Zealand, Sweden, Thailand, the 
USA, Wales, W. Germany). An inlemationally developed 
scoring system was used to rale the writing tasks in terms 
of organization, content, style, tone, mechanics, and 
handwriting. In addition, students, teachers, and schools 
filled out questionnaires. These data are now being 
examined in a number of ways. 

Sauli Takala, one of the coordinators of this study, 
described pattems of agreement and disagreement among 
raters application of a five-point rating scale (which 
included the criterion "off the topic"). He found that raters 
behaved in a uniform manner. Most of the time, two 
readers were within one point of being in full agreement 
with each other. Beyond a one-point discrepancy on the 
rating scale, there was a significant drop in frequency ( 2 
points off: 5-12%; "off the topic": 2.5-7.5%, 3 points off: 

PS 



2.5-5%). He then discussed where on the scale these 
discrepancies were occurring. Agreement was greatest at 
the high end of the sralc and least likely in the low-middle 
range of scores. 

Takala then discussed where the rating of "off topic" 
appeared? In early discussions with colleagues, it was 
anticipated that this rating would pair up with rating? at 
the high end of the scale (an essay would be so creative as 
to elicit eith^" "very good" or "off topic".) In fact, just 
the opposite was true: "off topic." appeared at the low end 
of the scale 'viih "poor." Surprisingly, it also occurred in 
the middle range. Takala noted that p^aps some raters 
were unsure of how to score such essays and so chose a 
middle ground. In general, similarities between raters 
outweighed differences, lending credibility to further 
comparisons. 

Elaine Degenhart. another coordinator of the lEA 
Study of Written Communication, looked at relationships 
between writing instruction and sUident perfoimance, 
using data from the teacher questionnaires, and 
questionnaires on the background and curriculum of the 
schools involved in the lEA study. The purpose of her 
work was to identify some patterns in instructional 
approaches and to determine how well the variable that 
show these approaches discriminate between low, middle, 
and high achieving classes. The four main approaches 
that emerged were product, process, reading-Hierature, and 
a less well defined skills-oriented approach with emphasis 
on product. Based on mean scores on the writing tasks, 
classes were divided into achievement levels: 25% high, 
50% middle, 25% low. The lop two instructional 
approaches for each country were then examined in terms 
of how well they discriminate for the three levels of 
classes. Degenhart reported on findings from four of the 
countries: Chile, Finland, New Zealand, and the U.S. 

The lop two teaching suategies found for Chile were 

(1) a strongly student-centered approach with a process 
orientation and (2) a stronger product orientation. Here it 
appeared that low-achieving sUidents had more process- 
centered leaching, wiiereas the product-centered r jproach 
distinguished well for the middle group. In Finland, the 
lop two teaching strategies were (I) a reading-literature 
approach and (2) a process approach. The process 
approach did not distinguish between the top and bouom 
groups; the reading-literature approach was positive for 
low-achieving sUidents. In New Zealand, the lop two 
were (1) a teacher centered reading/literature approach and 

(2) a less clearly defined approach leaning toward process. 
Both discriminated between all three levels. In the United 
States, the top two approaches were (1) a structured 
reading/literature approach and (2) a strong siudcnt-ccntcrcd 
product orientation. The product orientation was high for 
the low-level students. 

Questions centered around possible interpretations of 

ERLC 



these findings. Degenhart was careful not to draw 
premature conclusions or make quick generalizations. 
From the discussion it became clear that a greater 
understanding of the background situation in each country 
would help with the interpretation of why classes were 
receiving a particular type of writing instruction. 

EFFECTS OF ESSAY TOPIC VARIATION ON 
STUDENT WRITING 

Speaker: Gorden Brossell, Florida State 

University 

Jim Hoetker, Florida State University 

Introducer/ 

Recorder: Laura Brady, George Mason University 
VA 

Gorden Brossell and Jim Hoetker presented the 
resuits of a study designed to analyze the ways in which 
systematic variations in essay topics affected the writing 
of college students under controlled conditions. To 
explore the question of whether a change in topic makes a 
difl'erence in the quality of sUident response, Brossell and 
Hoetker chose extremes of topic and student population. 
The population consisted of remedial sUidents and honors 
students writing in response to a regular course 
assignment. The year-long study (May 1987-April 1988) 
was based on 557 essays collected from four Florida sites: 
the University of Florida, Miami-Dade Community 
College, Valencia Community College, and Tallahassee 
Community College. 

The general essay topic for this project. The most 
harmful educational experience," was written according to 
procedures developed by Brossell and Hoetker in their 
previous research on content-fair essay examination topics 
for large scale writing assessmeuis (CCC . October 1986). 
Brossell and Hoetker then varied this topic in two ways: 
(1) they controlled the degree of rhetorical specification 
and (2) they changed the wording to invite subjec^ve and 
objective responses. These variations yielded four 
versions of the topic: 

Minimal rhetorical specification requesting ar. 
impersonal discussion 

Minimal rhetorical specification requesting a 
report of personal experience 
Full rhetorical specification requcstirg an 
impersonal account 
• Full rhetorical specification requesting a report of 
personal experience 

The essays written in response to these topic variations 
were scored holistically on a 7-poini scale by experienced 
graders; ihc scale included operational descriptions for four 
levels of quality ( 1 ,3,5,7) and left the other three variabl-^s 



(2AJS) imqiecified in oider to give the raters greater 
flexibility. The essays were also scored analytically 
acceding to ten items in three categories: (1) 
development, (2) voicc/q)eaker,^)ersona, and (2) 
readability. 

Although the original plan had been gathet samples 
firom extreme student populations (high- and low-abiUty), 
differences between institutions ir the average quality of 
student writing were noticeable: many "low-ability** 
students wrote as wdl as or better than suidents ranked as 
'^high-ability." As a result, thr sample fell into a bell- 
curve distribution. The research concluded that there is no 
evidence from either the holistic-scale scores oi the 
analytic-scale scores that even gross variations in phrasing 
affect either the quality of student responses or the nature 
of student-topic interaction. Other conclusions: the 
appearance of fust-person voice is significantly higher in 
essays wriuen in response to topics calling for accounts of 
personal experience, but it is unaffected by die degree of 
rhetorical q)ecification. 

In a discussion fdlowing die presentation of the 
research, Brossell and Hoedcer mentioned plans for future 
work diat include a study to evaluate die effect of content 
variation in essay topics when w(xding and rhetorical 
specification are held constant They also plan to develop 
their analytic score further, based on additional essays 
written at greater leisure and revised, and rq)resenting 
average and high-ability students as well as low-ability 
students. Widi revision and devek)pment to make tiie 
scale reliable and "tranqxxtable," die analytic scale mig^t, 
acc(xding to Brossell and Hoetker, have die potential to 
become an alternative to the single-digit holistic score. 



WHAT SHOULD BE A TOPIC? 

Speakers: Sandra Murphy, San Francisco State 
University, 

Leo Ruth, University of California, 
Berkeley 

Introducer/ 

Recorder: Robert L Brown, Jr., University of 
Minnesota 

Taking a cue from die Bay Area Writing Pjroject*K 
collective spirit, Sandra Murphy and Leo Rudi rejected die 
usual pand format by opening the session to audience 
discussion of issues influencing subject-selection for 
holistic sc(»ing. They directed die session widi six 
questions (tre;>ted at greater lengdi in dieir recent ABLEX 
book Designinfg Writing Tasks for the Assftssment nf 
S^dtiOg}. Their questions examined die dual probleni 
facing assessment designers: naming a subject and 
providing die writers widi instructions about what to do 
widi it In part, die session provided a fonim for a 



critkiue of bodi die entire agenda of holistic scoring and of 
die specifics of assessment design. But it also altowed 
Murphy and Rudi fonnat in whkh to report some of die 
fuidiii^s from dieir work. 

The six questions treat variously die syntactico- 
semantk structure of the items, die discourse structures 
suggested, the power relationships estaUished between 
test(er) and writer, and die cultural knowledge 
presupposed. The six questions and comments from the 
presenters and audience are as follows: 

1 . How much information should be provided about die 
subject? 

Muq)hy and Rudi's findings suggest that a simple 
referring phrase (NP) elicited less rkh reqxKises than 
a full proposition. When a predicate was provided, 
writer responses were more "reasonable anid 
req)onsible.'* 

2. How does specificfition of a subject constrain 
response? 

Discussion demonstrated die range of possible 
constraints: discourse type, qualification, 
quantification, text structure, style, and-always- 
ideology, explicit and implied 

3 . How does knowledge of the subject affect 
perfixman'*^? 

The session members soon raised the meta-question 
of whedieraoy topic could not require "specialized 
knowledge," and dierefore whedier holistic essay 
testing could be free from politi;;al bias. Generally, 
Muiphy and Rudi and die session members agreed 
diat knowing a lot about die topic was a great 
advantage, and die "Icnowledge*" extended well 
beyond simple propositional knowledge to 
familiarity widi cultural discourses. 

4. Should students be given options in selecting 
topics? 

Generally, options invite confusion. Items may not 
be equally difficult Students may not be wise in 
selecting, picking complex topics and writing 
complex, bad essays. Confusion over die selection 
process may penalize. 

5 . How do rhetorical specifications affect performance? 

Students did not seem to be helped by suggestions of 
rhetorical type. Typically, diey igncrnd diem or 
found diat die problem of executing the rhettwical 
command interfered widi their writing in general. 



6. To what extent should admonitions about the 
writing task be mentioned? Time limits, pitfalls, 
and so on? 

Again, the political demands of the writing 
assessment as an institution overwhelms the testers' 
attempts to help: students write the essay they have 
in mind, igmving the instructions or finding 
themselves confounded by them. 

The session eloquently e^qpressed reservations about the 
ideology of holistic scoring and mass asse^ssment in 
general. The conferees rented to the inherent artificiality 
of pretending to write authentic [HOse while authentically 
demonstrating familiarity with academic conventions. 
They agreed that students who know the conventions of 
testing will, predictably, do best 

CLASSROOM RLSEARCH AND WRITING 
ASSESSMENT 

Speaker: Myles Meyers^ California Federation of 
Teachers 

Introducer/ 

Recorder: Deborah Appleman^ Carleton College, 
Minnesota 



Myles Meyers addressed the issue of large scale 
assessment from the perspectives of the K*12 
administrator and classroom teacher. From these 
perspectives he flnds large scale assessment to be 
problematic and often ill-advised. The enormous diversity 
of schools makes it difficult to capture the current **state 
of the art.** Myers also contended that state assessments 
such as California's CTBS work against teaching as well 
as against the professionalization of teachers. 

Meyers discussed at length the seemingly 
reductk)nist quality of large scale assessment Although 
recent research on writing maintains that writing is a 
multiple construct, time and financial constraints limit the 
constructs that can be examined. The construct that is 
employed to define writing thus become^ the primary 
focus for a particular grade (for example, autobiography in 
grade 10). In our effort to handle the assessment tadc by 
limiting constructs, our definition of writing, as well as 
its instruction, therefore becomes uni-dimensional. 
Moreover, because of th e inevitable prescriptive quality of 



the interpretation of assessment results as well as teachers' 
lack of involvement and consequently lack of ownership 
in the entire assessment process, Meyers claimed that 
statewide assessments can destroy teaching-as-inquiry and 
harm student learning. 

Meyers then presented several suggestions for 
involving teachers in the assessment process. He 
emphasized the importance of having teachers participate 
signiflcantly through summ^ institutes at university 
settings. He also underscored the importance of viewing 
assessment as a process of inquiry, one in which 
disagreement is as important as agreement To illustrate 
the value of assessment as inquiry, Meyers handed out 
three sample student papers and asked the audience to rank 
them as tdgh, middle, and low. The resulting scoring was 
quite discrq)ant, as were the reasons offered for the 
rankings. Meyers then discussed the value (tfdiscrepaiKy 
in our aim to improve literacy for all children. Rather 
than considering agreement as the ultimate goal in 
assessment, discrq)ancy can lead to a fruitful dialogue 
about our underiying assumptions about teaching good 
writing as well as about its evaluation. 

Meyers pointed out that dialogues or debates such as 
those generated by the conferees when they wm asked to 
rank the papers were a critical aspect of the assessment 
process. He stressed the importance of having classroom 
teachers as active participrnts in an on-going debate on 
assessment, rather than as recipients of an administrative 
decisio*^ to employ a particular large scale issessment 
insuumuit He then handed out six additional student 
paqxrs, and asked confine to rank them and then to 
discuss the rankings in pairs. As with the first exercise, 
the rankings were widely discrepant Meyers illustrated 
how this kind of exercise can be used to encourage 
teachers to think explicitly about their pedagogy and also 
described several ways in which the raiddng of student 
writing can be employed to generate discussion among 
teachers. For example, he has asked teachers to devise 
sample lessons for students whose papers they have 
ranked 

Meyers ended hf« provocative discussion by 
suggesting several ways in which writing can be viewed 
as a speech act and as a collaborative social event He 
discussed the diffaiences and similarities between 
conversation and written presentation. Meyers concluded 
his talk with the following thought: "When you teach 
people how to write, you leach them a new definition of 
themselves.** 



COMPUTERS AND THE TEACHING OF 
WRITING 

Speakers: Michael Ribaudo, The City University 
of New York, 

Linda Meeker, Ball State University 

Introducer/ 

Recorder: Donald Ross, University of 
Minne^lis 

Both speakers discussed the National Project on 
Computers and College Writing, a thiee-year projea 
supported by the Fund for the Improvement of 
Postsecondary Education and The City University of New 
Yoric This project is coordinated by three of the NTNW 
directors: Michael Ribaudo, Harvey Wiener, and Karen 
Gieenberg. 

Michael Ribaudo explained the goals of the project: 
it will (1) identify outstanding college programs that have 
incorporated computers in fieshman-level composition 
courses, (2) conduct research on the mipact of computers 
on students' writing abilities, (3) develop and disseminate 
reports on this research and on instructional philosophy 
and methodology, and (4) host a national conference 
showcasing the programs and the research. 

Ribaudo noted that, at this point in time, fifteen 
colleges and universities from across the country are 
involved in the project They are developing research 
designs that will pair three "computer** sections and three 
traditional sections and three traditional sections at each 
site. Some of the research instruments to assess students 
include essay tests (scored holistically and analytically), 
multiple-choice tests, and questionnaires on writing 
anxiety and writing attitudes. 

Linda Meeker discussed her university's participation 
in the project, and summarized the efforts that Ball State 



has already made in evaluating the effects of computers on 
the teaching and learning of writing. 

She described three of her recent studies. Theflrst 
study assessed student attitudes toward using computer- 
assisted instruction (CAI) in basic writing classes. She 
found that CAI proved effective in terms of students* time 
management and that basic writing students devek)ped 
positive attitude toward CAI. Her second study fccused on 
using **invention** software to assist the composing 
processes of basic writing students. Results indk:aicd 
highly positive student attitudes and a noticeable 
improvement in students* ability to focus on their topics. 
Meeker*s third study examined the revi ;ing strategies of 
basic writing students. Thas study revealed that stadei ts 
spent significant ai.'^ounts of time in a variety of 
prewriting and revising activities, but it was unclear 
whether the text manipulations were clearly related to a 
greater flexibility provided by CAI. However, Meeker did 
And that the computo* enabled students to do more 
frequent -and more productive-pre-editing. 

Next, Medcer described some of the studies that will 
be conducted by the National Project on Computers and 
C 3llege Writing. She noted that data collected from these 
large scale assessments will either confirr^ or call into 
question the results of her studies. Students attitudes 
toward CAI and the effectiveness ot word-processing as a 
tool for inventing, composing, revising and editing will 
be evaluated. Moreover, each of the project sites will 
examine the comparative effectiveness of diffment 
hardware and software configuratkNis available at their 
institutions. 

For further information on this project, or the 
conference which is scheduled for Spring 1990, write to 
Dean Ribaudo at CUNY, 535 East 80th Street, New 
York, NY 10021. 



New works on writing assessment by NTNW members: 

THE lEA STUDY OF WRITTEN COMPOSITION: THE INTERNATIONAL WRITING 
TASKS AND SCORING SCALES 

Edited by T.P. Gorman, A.C. Purves, and R. E. Degenhart 
Pergamon, Oxford, England, 1988 

THE EVALUATION OF COMPOSmON INSTRUCTION, Second Edition 
by Barbara Gross, Michael Scriven, and Susan Thomas 
Teachers College Press, NY, 1987 

DESIGNING WRITING TASKS FOR THE ASSESSMENT OF WRmNG 

by Leo Ruth and Sandra Murphy 
Ablex, Norwood, NJ 1987 




33 



How You May Participate 



NTNW needs the active paiticipttion of those who have t concern vith writing skills 
assessment, whether as specialists, administrators, or classroom teachers. If you 
wish to become a member of the network or ta learn more about who we are, what we 
plan to be doing, and how our plans could involve you, just complete the coupon 
below and return it to us along with materials describing yourself and your 
professional interests in writing instruction and assessment. 



Name 



Position 



Institution. 
Address 



J would like to be on NTNWs mailing list. 

J would be willing to share information about my writing i ssessment 
program with NTNW. 



Please return to: 

Karen Greenberg, Director 

National Testrng Network in Writing 

Office of Academic Affairs 

The City University of New York 

535 East 80th Street 

New York, New York 10021 

(212) 794^5446 



3. 



ERIC 



35 



